Abstract
In this paper a differential evolution (DE)-based feature selection technique is developed for anaphora resolution in a resource-poor language, namely Bengali. We discuss the issues of adapting a state-of-the-art English anaphora resolution system for a resource-poor language like Bengali. Performance of any anaphoric resolver greatly depends on the quality of a high accurate mention detector and the use of appropriate features for anaphora resolution. We develop a number of models for mention detection based on machine learning and heuristics. In anaphora resolution there is no globally accepted metric for measuring the performance, and each of them such as MUC, \(\hbox {B}^{3}\), CEAF, Blanc exhibit significantly different behaviors. Our proposed feature selection technique determines the near-optimal feature set by optimizing each of these evaluation metrics. Experiments show how a language-dependent system (designed primarily for English) can attain reasonably good performance level when re-trained and tested on a new language with a proper subset of features. Evaluation results yield the F-measure values of 66.70, 59.47, 51.56, 33.08 and 72.75 % for MUC, B 3, CEAFM, CEAFE and BLANC, respectively.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-014-1397-3/MediaObjects/500_2014_1397_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-014-1397-3/MediaObjects/500_2014_1397_Fig2_HTML.gif)
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Here B, I and O denote the beginning, internal and outside the token representing the entity mention.
Henceforth all the Bengali glosses are written in ITRANS notations available at http://www.aczoom.com/itrans/.
References
Adapting a state-of-the-art Anaphora resolution system for resource-poor language. In: Proceedings of the sixth international joint conference on natural language processing, Asian Federation of natural language processing
Anderson TW, Scolve S (1978) Introduction to the statistical analysis of data. Houghton Mifflin, Boston
Bagga A, Baldwin B (1998) Algorithms for scoring coreference chains. In: Proceedings of the LREC workshop on linguistic coreference, Granada, pp 563–566
Chatterji S, Dhar A, Barik B, Moumita PK, Sarkar S, Basu A (2011) Anaphora resolution for Bengali, Hindi, and Tamil using random tree algorithm in wek. In: Proceedings of NLP Tools Contest on Anaphora Resolution in Indian Languages
Dakwale P, Sharma H (2011) Anaphora resolution in Indian languages using hybrid approaches. In: Proceedings of NLP Tools Contest on Anaphora Resolution in Indian Languages
Doddington G, Mitchell A, Przybocki M, Ramshaw L, Strassell S, Weischedel R (2004) The automatic content extraction (ACE) program-tasks, data, and evaluation. In: Proceedings of LREC
Ekbal A, Saha S, Uryupina O, Poesio M (2011) Multiobjective simulated annealing based approach for feature selection in anaphora resolution. In: Proceedings of the DAARC, pp 47–58
Ghosh A, Neogi S, Chakrabarty S, Bandyopadhyay S (2011) Anaphora resolution in Bengali. In: Proceedings of NLP Tools Contest on Anaphora Resolution in Indian Languages
Hoste V (2005) Optimization issues in machine learning of coreference resolution. PhD thesis, Antwerp University
Iida R, Inui K, Takamura H, Matsumoto Y (2003) Incorporating contextual cues in trainable models for coreference resolution. In: Proceedings of the EACL workshop on the computational treatment of Anaphora
Lafferty J (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Morgan Kaufmann, San Francisco, pp 282–289
Luo X (2005) On coreference resolution performance metrics. In: Proceedings of the NAACL/EMNLP, Vancouver
Luo X, Ittycheriah A, Jing H, Kambhatla A, Roukos S (2004) A mention-synchronous coreference resolution algorithm based on the bell tree. In. Proceedings of the ACL, pp 135–142
Luo X, Ittycheriah A, Jing H, Kambhatla N, Roukos S (2004) A mention-synchronous coreference resolution algorithm based on the Bell Tree. In: Proceedings of ACL, pp 136–143
McCarthy JF, Lehnert WG (1995) Using decision trees for coreference resolution. In: Proceedings of the fourteenth international joint conference on atificial intelligence, pp 1050–1055
Mitkov R (1999) Introduction: special issue on anaphora resolution in machine translation and multilingual nlp. Mach Transl 14:159–161
Morton TS (1999) Using coreference in question answering. In: Proceedings of the 8th text REtrieval conference (TREC-8), pp 85–89
Ng V, Cardie C (2002) Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 104–111
Ng V, Cardie C (2002) Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp 104–111
NLP tools contest on Anaphora resolution in Indian languages organized in ICON-2011: 9th international conference on natural language processing, anna university-mit campus, Chromepet, Chennai, India, pp 16–19. http://ltrc.iiit.ac.in/icon2011/contests.html
Poesio M, Kabadjov MA (2004) A general-purpose, off-the-shelf anaphora resolution module: Implementation and preliminary evaluation. In: Proceeding of LREC, pp 663–666
Ponzetto SP, Strube M (2006) Exploiting semantic role labeling, wordnet and wikipedia for coreference resolution. In: Proceedings of the human language technology conference of the NAACL, Main Conference, New York City, USA, Association for Computational Linguistics, pp 192–199
Quinlan JR (1993) Programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco
Recasens M, Hovy E (2011) Blanc: Implementing the rand index for coreference evaluation. Nat Lang Eng 17:485–510
Recasens M, Hovy E (2009) A deeper look into features for coreference resolution. In: Lalitha Devi S, Branco A, Mitkov R (eds.) Anaphora processing and applications (DAARC 2009. Number 5847 in LNAI). Springer, Berlin/Heidelberg, pp 29–42
Saha S, Ekbal A, Uryupina O, Poesio M (2011) Single and multi-objective optimization for feature selection in anaphora resolution. In: Proceedings of the fifth international joint conference in natural langauge processing (IJCNLP 2011), pp 93–101
Senapati A, Garain U (2011) Anaphora resolution system for Bengali by pronoun emitting approach. In: Proceedings of NLP Tools Contest on Anaphora Resolution in Indian Languages
Sha F, Pereira F (2003) Shallow parsing with conditional random fields, pp 213–220
Sikdar U, Ekbal A, Saha S, Uryupina O, Poesio M (2013) Adapting a state-of-the-art anaphora resolution system for resource-poor language. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing (IJCNLP), pp 815–821
Soon WM, Ng HT, Lim DCY (2001) A machine learning approach to coreference resolution of noun phrases. Comput Linguist 27(4):521–544
Steinberger J, Poesio M, Kabadjov MA, Jeek K (2007) Two uses of anaphora resolution in summarization. In: Information processing and management: an international journal, pp 1663–1680
Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359
Uryupina O (2007) Knowledge acquisition for coreference resolution. PhD thesis, University of the Saarland
Uryupina O (2010) Corry: a system for coreference resolution. In: Proceedings of the 5th international workshop on semantic evaluation (SemEval’10)
Versley Y (2006) A constraint-based approach to noun phrase coreference resolution in german newspaper text. In: Proceedings of Konferenz zur Verarbeitung Nat rlicher Sprache, pp 143–150
Versley Y, Ponzetto SP, Poesio M, Eidelman V, Jern A, Smith J, Yang X, Moschitti A (2008) Bart: a modular toolkit for coreference resolution. In: HLT-demonstrations ’08 proceedings of the 46th annual meeting of the association for computational linguistics on human language technologies, pp 9–12
Vilain M, Burger J, Aberdeen J, Connolly D, Hirschman L (1995) A model-theoretic coreference scoring scheme. In: Proceedings of the sixth message understanding conference, pp 45–52
Walker C, Strassel S, Medero J, Maeda K (2006) Ace 2005 multilingual training corpus. Linguistic data consortium, Ldc2006t06 philadelphia penn
Weischedel R, Pradhan S, Ramshaw L, Palmer M, Xue N, Marcus M, Taylor A, Greenberg C, Hovy E, Belvin R, Houston A (2008) Ontonotes release 2.0. Linguistic data consortium, ldc2008t04 philadelphia penn
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques (Morgan Kaufmann Series in Data Management Systems), 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco
Yang X, Su J, Tan CL (2005) A twin-candidate model of coreference resolution with non-anaphor identification capability. In: Proceedings of IJCNLP, pp 719–730
Yang X, Zhou G, Su J, Tan CL (2003) Coreference resolution using competition learning approach. In: Proceedings of the 41st annual meeting of the association for computational linguistics, pp 176–183
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Sikdar, U.K., Ekbal, A., Saha, S. et al. Differential evolution-based feature selection technique for anaphora resolution. Soft Comput 19, 2149–2161 (2015). https://doi.org/10.1007/s00500-014-1397-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-014-1397-3