A dependency annotation scheme for Bangla treebank

Chatterji, Sanjay; Sarkar, Tanaya Mukherjee; Dhang, Pragati; Deb, Samhita; Sarkar, Sudeshna; Chakraborty, Jayshree; Basu, Anupam

doi:10.1007/s10579-014-9266-3

A dependency annotation scheme for Bangla treebank

Original Paper
Published: 26 March 2014

Volume 48, pages 443–477, (2014)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Sanjay Chatterji¹,
Tanaya Mukherjee Sarkar¹,
Pragati Dhang¹,
Samhita Deb¹,
Sudeshna Sarkar¹,
Jayshree Chakraborty² &
…
Anupam Basu¹

422 Accesses
Explore all metrics

Abstract

Dependency grammar is considered appropriate for many Indian languages. In this paper, we present a study of the dependency relations in Bangla language. We have categorized these relations in three different levels, namely intrachunk relations, interchunk relations and interclause relations. Each of these levels is further categorized and an annotation scheme has been developed. Both syntactic and semantic features have been taken into consideration for describing the relations. In our scheme, there are 63 such syntactico–semantic relations. We have verified the scheme by tagging a corpus of 4167 Bangla sentences to create a treebank (KGPBenTreebank).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

The dependency grammar for Bangla language and the Bangla treebank is created under the project “The Bangla Treebank”. This project is supported by Linguistic Data Consortium for Indian Languages (LDC-IL) built by MHRD, Govt. of India under the aegis of the Central Institute of Indian Languages, Mysore, India. See the link for details. http://www.cel.iitkgp.ernet.in/~oldtools/kgpbentreebank.html.
The list with detailed description of the dependency relations can be seen at http://corpus.quran.com/documentation/syntaxrelation.jsp
The annotation has been done using the Sanchay annotation tool of Singh (2011)

References

Begum, R., Husain, S., Dhwaj, A., Misra, D., Bai, L., & Sangal, R. (2008). Dependency annotation scheme for indian languages. In Proceedings of the third international joint conference on natural language processing(IJCNLP). Hyderabad, India.
Bharati, A., Chaitanya, V., Sangal, R. (1999) Natural language processesing: A paninian perspective. New Delhi: Prentice-Hall of India.
Bharati, A., Sangal, R., Chaitanya, V., Kulkarni, A., Sharma, D. M., & Ramakrishnamacharyulu, K. V. (2002). Anncorra: building tree-banks in indian languages. In Proceedings of the 3rd workshop on Asian language resources and international standardization (Vol. 12, pp. 1–8), COLING ’02.
Bhatt, R., Narasimhan, B., Palmer, M., Rambow, O., Sharma, D. M., & Xia, F. (2009). A multi-representational and multi-layered treebank for hindi/urdu. In Proceedings of the third Linguistic annotation workshop, ACL-IJCNLP ’09, (pp. 186–189). Association for Computational Linguistics, Stroudsburg, PA, USA. URL http://dl.acm.org/citation.cfm?id=1698381.1698417
Black, E., Eubank, S., Kashioka, H., Magerman, D., Garside, R., & Leech, G. (1996). Beyond skeleton parsing: Producing a comprehensive large-scale general-English treebank with full grammatical analysis. In Proceedings of the 17th international conference on computational linguistics (COLING-96), (pp. 107–112).
Black, E. W., Garside, R., & Leech, G. N. (Eds.) (1993). Statistically-driven computer grammars of English: The IBM/Lancaster approach. No. 8 in Language and Computers. Amsterdam. http://books.google.de/books?id=Hkzr-LYVz2wC&lpg=PR5&ots=QJhw16OVS4&dq=Statistically-driven%20computer%20grammars%20of%20English&lr&pg=PP1#v=onepage&q&f=false
Chakravarty, B. (2010). “uchchatara bangla vyakaran”, a complete text book on higher bengali grammar. Akshay Malancha.
Charniak, E., Blaheta, D., Ge, N., Hall, K., Hale, J., & Johnson, M. (2000). Bllip 1987–89 wsj corpus release 1. Linguistic Data Consortium.
Chatterji, S., Sarkar, T. M., Sarkar, S., & Chakraborty, J. (2009). Karak relations in bengali. In Proceedings of 31st All-India conference of Linguists (AICL 2009), (pp. 33–36). Hyderabad, India.
Chatterji, S. K. (2003). Bhasha-prakash bangala vyakaran [a grammar of the bangla language]. Calcutta: Roopa and Company.
Google Scholar
Chopde, A. (2000). Itrans “indian language transliteration package”, a package for printing text in indian language scripts. http://www.aczone.com/itrans/.
Dandapat, S., Sarkar, S., & Basu, A. (2004). A hybrid model for part-of-speech tagging and its application to bengali. In International conference on computational intelligence, (pp. 169–172).
de Marneffe, M., & Manning, C. D. (2008). Stanford typed dependencies manual.
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.
Article Google Scholar
Hajič, J., Böhmová, A., Hajičová, E., & Vidová-Hladká, B. (2000). The prague dependency Treebank: A three-level annotation scenario. In A. Abeillé (Ed.), Treebanks: Building and using parsed corpora (pp. 103–127). Amsterdam: Kluwer.
Google Scholar
Hajič, J., Hajičová, E., & Rosen, A. (1996). Formal representation of language structures. TELRI Newsletter, 3, 12–19.
Google Scholar
Hajič, J., Vidová-Hladká, B., & Pajas, P. (2001). The prague dependency Treebank: Annotation structure and support. In Proceedings of the IRCS Workshop on Linguistic Databases, (pp. 105–114). Philadelphia, USA: University of Pennsylvania.
Karlsson, F., Voutilainen, A., Heikkilä, J., & Anttila, A. (Eds.) (1995). Constraint Grammar: A language-independent system for parsing unrestricted text. Berlin: Mouton de Gruyter.
Marcus, M.P., Marcinkiewicz, M.A., & Santorini, B. (1993). Building a large annotated corpus of english: the penn treebank. Computational Linguistics 19, 313–330. http://dl.acm.org/citation.cfm?id=972470.972475
McCord, M. C. (1990). Slot grammar: A system for simpler construction of practical natural language grammars. In R. Studer (Ed.), Natural Language and Logic: Proceedings of the international scientific symposium, Hamburg, FRG, (pp. 118–145). Berlin: Springer.
Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31, 71–106. doi:10.1162/0891201053630264.
Article Google Scholar
Santorini, B., & Marcinkiewicz, M.A. (1991). Bracketing guidelines for the penn treebank project. unpublished manuscript.
Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19, 321–325.
Article Google Scholar
Sharma, D.M., Sangal, R., Bai, L., Begam, R., Ramakrishnamacharyulu, K. (2007). Anncorra : Treebanks for Indian languages, annotation guidelines (manuscript).
Singh, A. K. (2011). Part-of-speech annotation with sanchay. In Proceedings of the National Seminar On POS annotation for Indian Languages: Issues & Perspectives. Mysore, India.
Xue, N., Xia, F., Chiou, F.D., Palmer, M. (2005). The penn chinese treebank: Phrase structure annotation of a large corpus. In Natural Language Engineering.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, India
Sanjay Chatterji, Tanaya Mukherjee Sarkar, Pragati Dhang, Samhita Deb, Sudeshna Sarkar & Anupam Basu
Humanities and Social Sciences, Indian Institute of Technology, Kharagpur, India
Jayshree Chakraborty

Authors

Sanjay Chatterji
View author publications
You can also search for this author inPubMed Google Scholar
Tanaya Mukherjee Sarkar
View author publications
You can also search for this author inPubMed Google Scholar
Pragati Dhang
View author publications
You can also search for this author inPubMed Google Scholar
Samhita Deb
View author publications
You can also search for this author inPubMed Google Scholar
Sudeshna Sarkar
View author publications
You can also search for this author inPubMed Google Scholar
Jayshree Chakraborty
View author publications
You can also search for this author inPubMed Google Scholar
Anupam Basu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Sanjay Chatterji.

Appendices

Appendix 1: The Relation set of the Bangla Treebank

Intrachunk relations
ppl	Postposition/Anusarga	Rel. with noun/pron.
stc	Spatio-temp. con./Sthan-samay. samp.	Re. with space-time noun
vx	Auxiliary verb/Sahayak kriya	Related with verb
pof	Part of/Kriya antargata bisheshya	”
redup	Reduplication/Shabda dbaita	Rel. with same rhym.
frag	Fragment/Bhagnamsha	Related with suffix
Karak relations
k1d	Doer subject/Kriya sampadak karta	Related with verb
k1e	Experiencer subject/Anubhab karta	Related with verb
k1p	Passive subject/Paroksha karta	Related with verb
k1s	Noun of proposition/Samanadhikaran	Related with verb
k1g	General subject/Sadharan karta	Related with verb
k2t	Transitive object/sakarmak karma	Related with verb
k2m	Direct object/Mukhya karma	Related with verb
k2g	Indirect object/Gauna karma	Related with verb
k2u	Purposive object/Uddyeshya karma	Related with verb
k2s	Predicative object/Bidheya karma	Related with verb
k3	Instrumental/Karan	Related with verb
k5p	Place rel. ablative/Sthanbachak apadan	Related with verb
k5s	State rel. ablative/Abasthabachak apadan	Related with verb
k5t	Time rel. ablative/Kalbachak apadan	Related with verb
k5d	Dist. rel. ablative/Duratbabachak apadan	Related with verb
k7p	Place rel. locative/Deshadhikaran	Related with verb
k7t	Time rel. locative/Kaladhikaran	Related with verb
k7d	Domain rel. locative/Bishayadhikaran	Related with verb
k7s	State rel. locative/Bhabadhikaran	Related with verb
rh	Reason/Hetu	Related with verb
ru	Purpose/Uddeshya	Related with verb
des	Destination/Gantabyasthal	Related with verb
r6v	Possession/Dakhal	Related with verb
compr	Comparison/Taratamya	Related with any
sim	Similarity/Sadrishya	Related with any
Modifier Relations
r6	Genitive/Sambandha	Related with noun
ras	Associative relation/Saharthak sambandha	Related with noun
rasneg	Non-associative relation/Namarthak sambandha	Related with noun
nnmod	Noun noun modifier/Sanyogmulak bisheshya	Related with noun
jnmod	Adj. noun mod./Bisheshyer bisheshan	Related with noun
dnmod	Dem. noun mod./Nirnay suchak sarbanam	Related with noun
pronmod	Pron. noun mod./Sarbanamjata bisheshan	Related with noun
pnmod	Participial noun mod./Kridanta bisheshan	Related with noun
anmod	App. noun mod./Tulyarupe sthapita bisheshan	Related with noun
adv	Adv. mod./Kriya bisheshan jatiya bisheshan	Related with verb
vmod	Verb-verb modifier/Kriya jatiya bisheshan	Related with verb
neg	Negation modifier/Namarthak abyay	Related with verb
acomp	Adjectival Complement/Bidheya bisheshan	Related with verb
Few other interchunk relations
ccof	Conjunct/Samyojak abyay	Rel. with conjunct
pcc	Preconjunct/Abasthatmak abyay	Related with SC.
rad	Address word/Sambodhan sabda	Related with verb
par	Particle/Bakyalankar abyay	Related with verb
qs	Question mark/Prashnabodhak chihna	Related with verb
end	End/Samapti	Related with verb
sym	Symbol/Chihna	Related with verb
Interclause relation
ref	Referent/Nirdesak	Rel. with noun/pron
clausal*	Clausal star/Bakyamsha samagra	Related with verb
clausalcomp	Clausal complement/Bakyamsha sampurak	Related with verb
comp	Complementizer/Sampurak	Related with verb

rel.-related, pron.-pronoun, rhym.-rhyming word, mod.-modifier, adj.-adjectival, dem.-demonstrative, app.-appositional, adv.-adverbial, nom.-nominal, bish.-bisheshan, comp.-comparison, sim.-similarity, SC.-subordinating conjunction, temp.-Temporal, con.-Connection, samay.-Samaygata, samp.-Samparka

Appendix 2: Itrans to glyphs in Bangla and Hindi scripts mapping

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chatterji, S., Sarkar, T.M., Dhang, P. et al. A dependency annotation scheme for Bangla treebank. Lang Resources & Evaluation 48, 443–477 (2014). https://doi.org/10.1007/s10579-014-9266-3

Download citation

Published: 26 March 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s10579-014-9266-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A dependency annotation scheme for Bangla treebank

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Universal Dependency Treebank for Definitely Endangered Low-Resource Kangri Language

Building Uyghur Dependency Treebank: Design Principles, Annotation Schema and Tools

Prague Dependency Treebank

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: The Relation set of the Bangla Treebank

Appendix 2: Itrans to glyphs in Bangla and Hindi scripts mapping

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

A dependency annotation scheme for Bangla treebank

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Universal Dependency Treebank for Definitely Endangered Low-Resource Kangri Language

Building Uyghur Dependency Treebank: Design Principles, Annotation Schema and Tools

Prague Dependency Treebank

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: The Relation set of the Bangla Treebank

Appendix 2: Itrans to glyphs in Bangla and Hindi scripts mapping

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now