Abstract
Dependency grammar is considered appropriate for many Indian languages. In this paper, we present a study of the dependency relations in Bangla language. We have categorized these relations in three different levels, namely intrachunk relations, interchunk relations and interclause relations. Each of these levels is further categorized and an annotation scheme has been developed. Both syntactic and semantic features have been taken into consideration for describing the relations. In our scheme, there are 63 such syntactico–semantic relations. We have verified the scheme by tagging a corpus of 4167 Bangla sentences to create a treebank (KGPBenTreebank).









Similar content being viewed by others
Notes
The dependency grammar for Bangla language and the Bangla treebank is created under the project “The Bangla Treebank”. This project is supported by Linguistic Data Consortium for Indian Languages (LDC-IL) built by MHRD, Govt. of India under the aegis of the Central Institute of Indian Languages, Mysore, India. See the link for details. http://www.cel.iitkgp.ernet.in/~oldtools/kgpbentreebank.html.
The list with detailed description of the dependency relations can be seen at http://corpus.quran.com/documentation/syntaxrelation.jsp
The annotation has been done using the Sanchay annotation tool of Singh (2011)
References
Begum, R., Husain, S., Dhwaj, A., Misra, D., Bai, L., & Sangal, R. (2008). Dependency annotation scheme for indian languages. In Proceedings of the third international joint conference on natural language processing(IJCNLP). Hyderabad, India.
Bharati, A., Chaitanya, V., Sangal, R. (1999) Natural language processesing: A paninian perspective. New Delhi: Prentice-Hall of India.
Bharati, A., Sangal, R., Chaitanya, V., Kulkarni, A., Sharma, D. M., & Ramakrishnamacharyulu, K. V. (2002). Anncorra: building tree-banks in indian languages. In Proceedings of the 3rd workshop on Asian language resources and international standardization (Vol. 12, pp. 1–8), COLING ’02.
Bhatt, R., Narasimhan, B., Palmer, M., Rambow, O., Sharma, D. M., & Xia, F. (2009). A multi-representational and multi-layered treebank for hindi/urdu. In Proceedings of the third Linguistic annotation workshop, ACL-IJCNLP ’09, (pp. 186–189). Association for Computational Linguistics, Stroudsburg, PA, USA. URL http://dl.acm.org/citation.cfm?id=1698381.1698417
Black, E., Eubank, S., Kashioka, H., Magerman, D., Garside, R., & Leech, G. (1996). Beyond skeleton parsing: Producing a comprehensive large-scale general-English treebank with full grammatical analysis. In Proceedings of the 17th international conference on computational linguistics (COLING-96), (pp. 107–112).
Black, E. W., Garside, R., & Leech, G. N. (Eds.) (1993). Statistically-driven computer grammars of English: The IBM/Lancaster approach. No. 8 in Language and Computers. Amsterdam. http://books.google.de/books?id=Hkzr-LYVz2wC&lpg=PR5&ots=QJhw16OVS4&dq=Statistically-driven%20computer%20grammars%20of%20English&lr&pg=PP1#v=onepage&q&f=false
Chakravarty, B. (2010). “uchchatara bangla vyakaran”, a complete text book on higher bengali grammar. Akshay Malancha.
Charniak, E., Blaheta, D., Ge, N., Hall, K., Hale, J., & Johnson, M. (2000). Bllip 1987–89 wsj corpus release 1. Linguistic Data Consortium.
Chatterji, S., Sarkar, T. M., Sarkar, S., & Chakraborty, J. (2009). Karak relations in bengali. In Proceedings of 31st All-India conference of Linguists (AICL 2009), (pp. 33–36). Hyderabad, India.
Chatterji, S. K. (2003). Bhasha-prakash bangala vyakaran [a grammar of the bangla language]. Calcutta: Roopa and Company.
Chopde, A. (2000). Itrans “indian language transliteration package”, a package for printing text in indian language scripts. http://www.aczone.com/itrans/.
Dandapat, S., Sarkar, S., & Basu, A. (2004). A hybrid model for part-of-speech tagging and its application to bengali. In International conference on computational intelligence, (pp. 169–172).
de Marneffe, M., & Manning, C. D. (2008). Stanford typed dependencies manual.
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.
Hajič, J., Böhmová, A., Hajičová, E., & Vidová-Hladká, B. (2000). The prague dependency Treebank: A three-level annotation scenario. In A. Abeillé (Ed.), Treebanks: Building and using parsed corpora (pp. 103–127). Amsterdam: Kluwer.
Hajič, J., Hajičová, E., & Rosen, A. (1996). Formal representation of language structures. TELRI Newsletter, 3, 12–19.
Hajič, J., Vidová-Hladká, B., & Pajas, P. (2001). The prague dependency Treebank: Annotation structure and support. In Proceedings of the IRCS Workshop on Linguistic Databases, (pp. 105–114). Philadelphia, USA: University of Pennsylvania.
Karlsson, F., Voutilainen, A., Heikkilä, J., & Anttila, A. (Eds.) (1995). Constraint Grammar: A language-independent system for parsing unrestricted text. Berlin: Mouton de Gruyter.
Marcus, M.P., Marcinkiewicz, M.A., & Santorini, B. (1993). Building a large annotated corpus of english: the penn treebank. Computational Linguistics 19, 313–330. http://dl.acm.org/citation.cfm?id=972470.972475
McCord, M. C. (1990). Slot grammar: A system for simpler construction of practical natural language grammars. In R. Studer (Ed.), Natural Language and Logic: Proceedings of the international scientific symposium, Hamburg, FRG, (pp. 118–145). Berlin: Springer.
Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31, 71–106. doi:10.1162/0891201053630264.
Santorini, B., & Marcinkiewicz, M.A. (1991). Bracketing guidelines for the penn treebank project. unpublished manuscript.
Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19, 321–325.
Sharma, D.M., Sangal, R., Bai, L., Begam, R., Ramakrishnamacharyulu, K. (2007). Anncorra : Treebanks for Indian languages, annotation guidelines (manuscript).
Singh, A. K. (2011). Part-of-speech annotation with sanchay. In Proceedings of the National Seminar On POS annotation for Indian Languages: Issues & Perspectives. Mysore, India.
Xue, N., Xia, F., Chiou, F.D., Palmer, M. (2005). The penn chinese treebank: Phrase structure annotation of a large corpus. In Natural Language Engineering.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: The Relation set of the Bangla Treebank
Intrachunk relations | ||
ppl | Postposition/Anusarga | Rel. with noun/pron. |
stc | Spatio-temp. con./Sthan-samay. samp. | Re. with space-time noun |
vx | Auxiliary verb/Sahayak kriya | Related with verb |
pof | Part of/Kriya antargata bisheshya | ” |
redup | Reduplication/Shabda dbaita | Rel. with same rhym. |
frag | Fragment/Bhagnamsha | Related with suffix |
Karak relations | ||
k1d | Doer subject/Kriya sampadak karta | Related with verb |
k1e | Experiencer subject/Anubhab karta | Related with verb |
k1p | Passive subject/Paroksha karta | Related with verb |
k1s | Noun of proposition/Samanadhikaran | Related with verb |
k1g | General subject/Sadharan karta | Related with verb |
k2t | Transitive object/sakarmak karma | Related with verb |
k2m | Direct object/Mukhya karma | Related with verb |
k2g | Indirect object/Gauna karma | Related with verb |
k2u | Purposive object/Uddyeshya karma | Related with verb |
k2s | Predicative object/Bidheya karma | Related with verb |
k3 | Instrumental/Karan | Related with verb |
k5p | Place rel. ablative/Sthanbachak apadan | Related with verb |
k5s | State rel. ablative/Abasthabachak apadan | Related with verb |
k5t | Time rel. ablative/Kalbachak apadan | Related with verb |
k5d | Dist. rel. ablative/Duratbabachak apadan | Related with verb |
k7p | Place rel. locative/Deshadhikaran | Related with verb |
k7t | Time rel. locative/Kaladhikaran | Related with verb |
k7d | Domain rel. locative/Bishayadhikaran | Related with verb |
k7s | State rel. locative/Bhabadhikaran | Related with verb |
rh | Reason/Hetu | Related with verb |
ru | Purpose/Uddeshya | Related with verb |
des | Destination/Gantabyasthal | Related with verb |
r6v | Possession/Dakhal | Related with verb |
compr | Comparison/Taratamya | Related with any |
sim | Similarity/Sadrishya | Related with any |
Modifier Relations | ||
r6 | Genitive/Sambandha | Related with noun |
ras | Associative relation/Saharthak sambandha | Related with noun |
rasneg | Non-associative relation/Namarthak sambandha | Related with noun |
nnmod | Noun noun modifier/Sanyogmulak bisheshya | Related with noun |
jnmod | Adj. noun mod./Bisheshyer bisheshan | Related with noun |
dnmod | Dem. noun mod./Nirnay suchak sarbanam | Related with noun |
pronmod | Pron. noun mod./Sarbanamjata bisheshan | Related with noun |
pnmod | Participial noun mod./Kridanta bisheshan | Related with noun |
anmod | App. noun mod./Tulyarupe sthapita bisheshan | Related with noun |
adv | Adv. mod./Kriya bisheshan jatiya bisheshan | Related with verb |
vmod | Verb-verb modifier/Kriya jatiya bisheshan | Related with verb |
neg | Negation modifier/Namarthak abyay | Related with verb |
acomp | Adjectival Complement/Bidheya bisheshan | Related with verb |
Few other interchunk relations | ||
ccof | Conjunct/Samyojak abyay | Rel. with conjunct |
pcc | Preconjunct/Abasthatmak abyay | Related with SC. |
rad | Address word/Sambodhan sabda | Related with verb |
par | Particle/Bakyalankar abyay | Related with verb |
qs | Question mark/Prashnabodhak chihna | Related with verb |
end | End/Samapti | Related with verb |
sym | Symbol/Chihna | Related with verb |
Interclause relation | ||
ref | Referent/Nirdesak | Rel. with noun/pron |
clausal* | Clausal star/Bakyamsha samagra | Related with verb |
clausalcomp | Clausal complement/Bakyamsha sampurak | Related with verb |
comp | Complementizer/Sampurak | Related with verb |
Appendix 2: Itrans to glyphs in Bangla and Hindi scripts mapping

Rights and permissions
About this article
Cite this article
Chatterji, S., Sarkar, T.M., Dhang, P. et al. A dependency annotation scheme for Bangla treebank. Lang Resources & Evaluation 48, 443–477 (2014). https://doi.org/10.1007/s10579-014-9266-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-014-9266-3