Skip to main content
Log in

Differential evolution-based feature selection technique for anaphora resolution

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In this paper a differential evolution (DE)-based feature selection technique is developed for anaphora resolution in a resource-poor language, namely Bengali. We discuss the issues of adapting a state-of-the-art English anaphora resolution system for a resource-poor language like Bengali. Performance of any anaphoric resolver greatly depends on the quality of a high accurate mention detector and the use of appropriate features for anaphora resolution. We develop a number of models for mention detection based on machine learning and heuristics. In anaphora resolution there is no globally accepted metric for measuring the performance, and each of them such as MUC, \(\hbox {B}^{3}\), CEAF, Blanc exhibit significantly different behaviors. Our proposed feature selection technique determines the near-optimal feature set by optimizing each of these evaluation metrics. Experiments show how a language-dependent system (designed primarily for English) can attain reasonably good performance level when re-trained and tested on a new language with a proper subset of features. Evaluation results yield the F-measure values of 66.70, 59.47, 51.56, 33.08 and 72.75 % for MUC, B 3, CEAFM, CEAFE and BLANC, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://ltrc.iiit.ac.in/icon2011/contests.html.

  2. Here B, I and O denote the beginning, internal and outside the token representing the entity mention.

  3. http://crfpp.sourceforge.net.

  4. Henceforth all the Bengali glosses are written in ITRANS notations available at http://www.aczoom.com/itrans/.

  5. http://ltrc.iiit.ac.in/showfile.php?filename=downloads/shallow_parser.php.

  6. http://ltrc.iiit.ac.in/showfile.php?filename=downloads/shallow_parser.php.

References

  • Adapting a state-of-the-art Anaphora resolution system for resource-poor language. In: Proceedings of the sixth international joint conference on natural language processing, Asian Federation of natural language processing

  • Anderson TW, Scolve S (1978) Introduction to the statistical analysis of data. Houghton Mifflin, Boston

    MATH  Google Scholar 

  • Bagga A, Baldwin B (1998) Algorithms for scoring coreference chains. In: Proceedings of the LREC workshop on linguistic coreference, Granada, pp 563–566

  • Chatterji S, Dhar A, Barik B, Moumita PK, Sarkar S, Basu A (2011) Anaphora resolution for Bengali, Hindi, and Tamil using random tree algorithm in wek. In: Proceedings of NLP Tools Contest on Anaphora Resolution in Indian Languages

  • Dakwale P, Sharma H (2011) Anaphora resolution in Indian languages using hybrid approaches. In: Proceedings of NLP Tools Contest on Anaphora Resolution in Indian Languages

  • Doddington G, Mitchell A, Przybocki M, Ramshaw L, Strassell S, Weischedel R (2004) The automatic content extraction (ACE) program-tasks, data, and evaluation. In: Proceedings of LREC

  • Ekbal A, Saha S, Uryupina O, Poesio M (2011) Multiobjective simulated annealing based approach for feature selection in anaphora resolution. In: Proceedings of the DAARC, pp 47–58

  • Ghosh A, Neogi S, Chakrabarty S, Bandyopadhyay S (2011) Anaphora resolution in Bengali. In: Proceedings of NLP Tools Contest on Anaphora Resolution in Indian Languages

  • Hoste V (2005) Optimization issues in machine learning of coreference resolution. PhD thesis, Antwerp University

  • Iida R, Inui K, Takamura H, Matsumoto Y (2003) Incorporating contextual cues in trainable models for coreference resolution. In: Proceedings of the EACL workshop on the computational treatment of Anaphora

  • Lafferty J (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Morgan Kaufmann, San Francisco, pp 282–289

  • Luo X (2005) On coreference resolution performance metrics. In: Proceedings of the NAACL/EMNLP, Vancouver

  • Luo X, Ittycheriah A, Jing H, Kambhatla A, Roukos S (2004) A mention-synchronous coreference resolution algorithm based on the bell tree. In. Proceedings of the ACL, pp 135–142

  • Luo X, Ittycheriah A, Jing H, Kambhatla N, Roukos S (2004) A mention-synchronous coreference resolution algorithm based on the Bell Tree. In: Proceedings of ACL, pp 136–143

  • McCarthy JF, Lehnert WG (1995) Using decision trees for coreference resolution. In: Proceedings of the fourteenth international joint conference on atificial intelligence, pp 1050–1055

  • Mitkov R (1999) Introduction: special issue on anaphora resolution in machine translation and multilingual nlp. Mach Transl 14:159–161

    Article  Google Scholar 

  • Morton TS (1999) Using coreference in question answering. In: Proceedings of the 8th text REtrieval conference (TREC-8), pp 85–89

  • Ng V, Cardie C (2002) Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 104–111

  • Ng V, Cardie C (2002) Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp 104–111

  • NLP tools contest on Anaphora resolution in Indian languages organized in ICON-2011: 9th international conference on natural language processing, anna university-mit campus, Chromepet, Chennai, India, pp 16–19. http://ltrc.iiit.ac.in/icon2011/contests.html

  • Poesio M, Kabadjov MA (2004) A general-purpose, off-the-shelf anaphora resolution module: Implementation and preliminary evaluation. In: Proceeding of LREC, pp 663–666

  • Ponzetto SP, Strube M (2006) Exploiting semantic role labeling, wordnet and wikipedia for coreference resolution. In: Proceedings of the human language technology conference of the NAACL, Main Conference, New York City, USA, Association for Computational Linguistics, pp 192–199

  • Quinlan JR (1993) Programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  • Recasens M, Hovy E (2011) Blanc: Implementing the rand index for coreference evaluation. Nat Lang Eng 17:485–510

    Article  Google Scholar 

  • Recasens M, Hovy E (2009) A deeper look into features for coreference resolution. In: Lalitha Devi S, Branco A, Mitkov R (eds.) Anaphora processing and applications (DAARC 2009. Number 5847 in LNAI). Springer, Berlin/Heidelberg, pp 29–42

  • Saha S, Ekbal A, Uryupina O, Poesio M (2011) Single and multi-objective optimization for feature selection in anaphora resolution. In: Proceedings of the fifth international joint conference in natural langauge processing (IJCNLP 2011), pp 93–101

  • Senapati A, Garain U (2011) Anaphora resolution system for Bengali by pronoun emitting approach. In: Proceedings of NLP Tools Contest on Anaphora Resolution in Indian Languages

  • Sha F, Pereira F (2003) Shallow parsing with conditional random fields, pp 213–220

  • Sikdar U, Ekbal A, Saha S, Uryupina O, Poesio M (2013) Adapting a state-of-the-art anaphora resolution system for resource-poor language. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing (IJCNLP), pp 815–821

  • Soon WM, Ng HT, Lim DCY (2001) A machine learning approach to coreference resolution of noun phrases. Comput Linguist 27(4):521–544

    Article  Google Scholar 

  • Steinberger J, Poesio M, Kabadjov MA, Jeek K (2007) Two uses of anaphora resolution in summarization. In: Information processing and management: an international journal, pp 1663–1680

  • Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359

    Article  MathSciNet  MATH  Google Scholar 

  • Uryupina O (2007) Knowledge acquisition for coreference resolution. PhD thesis, University of the Saarland

  • Uryupina O (2010) Corry: a system for coreference resolution. In: Proceedings of the 5th international workshop on semantic evaluation (SemEval’10)

  • Versley Y (2006) A constraint-based approach to noun phrase coreference resolution in german newspaper text. In: Proceedings of Konferenz zur Verarbeitung Nat rlicher Sprache, pp 143–150

  • Versley Y, Ponzetto SP, Poesio M, Eidelman V, Jern A, Smith J, Yang X, Moschitti A (2008) Bart: a modular toolkit for coreference resolution. In: HLT-demonstrations ’08 proceedings of the 46th annual meeting of the association for computational linguistics on human language technologies, pp 9–12

  • Vilain M, Burger J, Aberdeen J, Connolly D, Hirschman L (1995) A model-theoretic coreference scoring scheme. In: Proceedings of the sixth message understanding conference, pp 45–52

  • Walker C, Strassel S, Medero J, Maeda K (2006) Ace 2005 multilingual training corpus. Linguistic data consortium, Ldc2006t06 philadelphia penn

  • Weischedel R, Pradhan S, Ramshaw L, Palmer M, Xue N, Marcus M, Taylor A, Greenberg C, Hovy E, Belvin R, Houston A (2008) Ontonotes release 2.0. Linguistic data consortium, ldc2008t04 philadelphia penn

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques (Morgan Kaufmann Series in Data Management Systems), 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  • Yang X, Su J, Tan CL (2005) A twin-candidate model of coreference resolution with non-anaphor identification capability. In: Proceedings of IJCNLP, pp 719–730

  • Yang X, Zhou G, Su J, Tan CL (2003) Coreference resolution using competition learning approach. In: Proceedings of the 41st annual meeting of the association for computational linguistics, pp 176–183

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asif Ekbal.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sikdar, U.K., Ekbal, A., Saha, S. et al. Differential evolution-based feature selection technique for anaphora resolution. Soft Comput 19, 2149–2161 (2015). https://doi.org/10.1007/s00500-014-1397-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-014-1397-3

Keywords

Navigation