Skip to main content
Log in

MODE: multiobjective differential evolution for feature selection and classifier ensemble

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In this paper, we propose a multiobjective differential evolution (MODE)-based feature selection and ensemble learning approaches for entity extraction in biomedical texts. The first step of the algorithm concerns with the problem of automatic feature selection in a machine learning framework, namely conditional random field. The final Pareto optimal front which is obtained as an output of the feature selection module contains a set of solutions, each of which represents a particular feature representation. In the second step of our algorithm, we combine a subset of these classifiers using a MODE-based ensemble technique. Our experiments on three benchmark datasets namely GENIA, GENETAG and AIMed show the F-measure values of 76.75, 94.15 and 91.91 %, respectively. Comparisons with the existing systems show that our proposed algorithm achieves the performance levels which are at par with the state of the art. These results also exhibit that our method is general in nature and because of this it performs well across the several domain of datasets. The key contribution of this work is the development of MODE-based generalized feature selection and ensemble learning techniques with the aim of extracting entities from the biomedical texts of several domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://research.nii.ac.jp/~collier/workshops/JNLPBA04st.htm.

  2. ftp://ftp.ncbi.nlm.nih.gov/pub/tanabe/GENEATG.tar.gz.

  3. ftp://ftp.cs.utexas.edu/pub/mooney/bio-data/interactions.tar.gz.

  4. http://research.nii.ac.jp/~collier/workshops/JNLPBA04st.htm.

  5. ftp://ftp.ncbi.nlm.nih.gov/pub/tanabe/GENEATG.tar.gz.

  6. ftp://ftp.cs.utexas.edu/pub/mooney/bio-data/interactions.tar.gz.

  7. A part of each training set is used as the development set.

  8. http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger.

  9. http://research.nii.ac.jp/~collier/workshops/JNLPBA04st.htm.

  10. ftp://ftp.ncbi.nlm.nih.gov/pub/tanabe/GENEATG.tar.gz.

  11. ftp://ftp.cs.utexas.edu/pub/mooney/bio-data/interactions.tar.gz.

  12. http://research.nii.ac.jp/~collier/workshops/JNLPBA04st.htm.

  13. ftp://ftp.ncbi.nlm.nih.gov/pub/tanabe/GENEATG.tar.gz.

  14. http://crfpp.sourceforge.net.

  15. http://www.iitp.ac.in/index.php/schools-and-centers/engineering/computer-science-a-engineering/people/faculty/dr-sriparna-saha.html.

  16. http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/ERtask/report.html.

  17. http://www.biocreative.org/news/biocreative-ii/.

References

  • Ali M, Pant M, Abraham A (2009) Simplex differential evolution. Acta Polytechnica Hungarica 6(5):95–115

  • Anderson TW, Scolve S (1978) Introduction to the statistical analysis of data. Houghton Mifflin, Boston

    MATH  Google Scholar 

  • Ando RK (2007) Biocreative ii gene mention tagging system at ibm watson. In: Proceedings of the second biocreative challenge evaluation workshop, Madrid, Spain, pp 101–103

  • Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing based multi-objective optimization algorithm: AMOSA. IEEE Trans Evolut Comput 12(3):269–283

    Article  Google Scholar 

  • Brest J, Mauec MS (2011) Self-adaptive differential evolution algorithm using population size reduction and three strategies. Soft Comput 15(11):2157–2174

    Article  Google Scholar 

  • Dasarathy BV, Sheela BV (1979) Composite classifier system design: concepts and methodology. Proc IEEE 67:708–713

    Article  Google Scholar 

  • Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156

    Article  Google Scholar 

  • Deb K (2001) Multi-objective optimization using evolutionary algorithms. Wiley, England

    MATH  Google Scholar 

  • Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evolut Comput 6(2):181–197

    Article  Google Scholar 

  • Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the first international workshop on multiple classifier systems, MCS’00. Springer, London, pp 1–15

  • Ekbal A, Saha S (2012) Multiobjective optimization for classifier ensemble and feature selection: an application to named entity recognition. IJDAR 15(2):143–166

    Article  Google Scholar 

  • Ekbal A, Saha S (2010a) Classifier ensemble selection using genetic algorithm for named entity recognition. Res Lang Comput 8(1):73–99

  • Ekbal A, Saha S (2010b) Weighted vote based classifier ensemble selection using genetic algorithm for named entity recognition. In: 15th International conference on applications of natural language to information systems (NLDB 2010), pp 256–267

  • Ekbal A, Saha S (2010c) Weighted vote based classifier ensemble selection using genetic algorithm for named entity recognition. In: Proceedings of the natural language processing and information systems, and 15th international conference on applications of natural language to information systems, NLDB’10, pp 256–267

  • Ekbal A, Saha S (2011a) A multiobjective simulated annealing approach for classifier ensemble: named entity recognition in indian languages as case studies. Expert Syst Appl 38(12):14760–14772

  • Ekbal A, Saha S (2011b) Weighted vote-based classifier ensemble for named entity recognition: a genetic algorithm-based approach. ACM Trans Asian Lang Inf Process 10(2):1–37

  • El-Hefnawy NA (2014) Solving bi-level problems using modified particle swarm optimization algorithm. Int J Artif Intell 12(2):88–101

    Google Scholar 

  • Finkel J, Dingare S, Nguyen H, Nissim M, Sinclair G, Manning C (2004) Exploiting context for biomedical entity recognition: from syntax to the web. In: Proceedings of the joint workshop on natural language processing in biomedicine and its applications (JNLPBA-2004), pp 88–91

  • Gmperle R, Mller SD, Koumoutsakos P (2002) A parameter study for differential evolution. In: WSEAS international conference on advances in intelligent systems, fuzzy systems, evolutionary computation, pp 293–298

  • Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, New York

    MATH  Google Scholar 

  • GuoDong Z, Jian S (2004) Exploring deep knowledge resources in biomedical name recognition. In: JNLPBA ’04: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications, pp 96–99

  • Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  • Heidl W, Thumfart S, Lughofer E, Eitzinger C, Klement EP (2013) Machine learning based analysis of gender differences in visual inspection decision making. Inf Sci 224:62–76

    Article  MathSciNet  Google Scholar 

  • Huang H, Lin Y, Lin K, Kuo C, Chang Y, Yang B, Chung I, Hsu C (2007) High-recall gene mention recognition by unification of multiple backward parsing models. In: Proceedings of the second biocreative challenge evaluation workshop, Madrid, Spain, pp 109–111

  • Jin-Dong K, Tomoko O, Tsuruoka Y et al (2004) Introduction to the bio-entity recognition task at jnlpba. In: JNLPBA ’04: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 70–75

  • Kim S, Yoon J, Park KM, Rim HC (2005) Two-phase biomedical named entity recognition using a hybrid method. In: IJCNLP, pp 646–657

  • Kuo C, Chang Y, Huang H, Lin K, Yang B, Lin Y, Hsu C, Chung I (2007) Rich feature set, unification of bidirectional parsing and dictionary filtering for high f-score gene mention tagging. In: Proceedings of the second biocreative challenge evaluation workshop, Madrid, Spain, pp 105–107

  • Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, pp 282–289

  • Li L, Fan W, Huang D, Dang Y, Sun J (2012) Boosting performance of gene mention tagging system by hybrid methods. J Biomed Inform 45(1):156–164

    Article  Google Scholar 

  • Li L, Sun J, Huang D (2010) Boosting performance of gene mention tagging system by classifiers ensemble. In: Natural language processing and knowledge engineering (NLP-KE)

  • Oliveira LS, Benahmed N, Sabourin R, Bortolozzi F, Suen CY (2001) Feature subset selection using genetic algorithms for handwritten digit recognition. In: Proceedings of 14th Brazilian symposium on computer graphics and image processing, Florianopolis, Oct 2001, IEEE, pp 362–369

  • Park KM, Kim SH, Rim HC, Hwang YS (2004) Me-based biomedical named entity recognition using lexical knowledge. ACM Trans Asian Lang Inf Process 5:4–21

    Article  Google Scholar 

  • Ponomareva N, Pla F, Molina A, Rosso P (2007) Biomedical named entity recognition: a poor knowledge hmm-based approach. In: NLDB, pp 382–387

  • Preitl S, Precup RE (2006) Iterative feedback tuning in fuzzy control systems. Theory and applications. Acta Polytech Hung 3(3):81–96

    Google Scholar 

  • Saha SK, Sarkar S, Mitra P (2009) Feature selection techniques for maximum entropy based biomedical named entity recognition. J Biomed Inform 42(5):905–911

  • Settles B (2004) Biomedical named entity recognition using conditional random fields and rich feature sets. In: JNLPBA ’04: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 104–107

  • Sikdar UK, Ekbal A, Saha S (2012) Differential evolution based feature selection and classifier ensemble for named entity recognition. In: COLING, pp 2475–2490

  • Smith L, Tanabe L, Ando R, Kuo CJ, Chung IF, Hsu CN, Lin YS, Klinger R, Friedrich C, Ganchev K, Torii M, Liu H, Haddow B, Struble C, Povinelli R, Vlachos A, Baumgartner W, Hunter L, Carpenter B, Tsai R, Dai HJ, Liu F, Chen Y, Sun C, Katrenko S, Adriaans P, Blaschke C, Torres R, Neves M, Nakov P, Divoli A, Lopez MM, Mata J, Wilbur WJ (2008) Overview of biocreative II gene mention recognition. Genome Biol 9(Suppl 2)

  • Song Y, Kim E, Lee GG, Yi B(2004) Posbiotm-ner in the shared task of bionlp/nlpba 2004. In. In Proceedings of the joint workshop on natural language processing in biomedicine and its applications (JNLPBA-2004)

  • Storn R, Price K (1997) Differential evolution a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359

    Article  MATH  MathSciNet  Google Scholar 

  • Victor O, Tiwari A, Roy R (2005) Evolutionary computing in manufacturing industry: an overview of recent applications. Appl Soft Comput 5(3):181–299

    Google Scholar 

  • Wang H, Zhao T, Tan H, Zhang S (2008) Biomedical named entity recognition based on classifiers ensemble. Int J Comput Sci Appl 5:1–11

    Article  MATH  Google Scholar 

  • Yang J, Honavar VG (1998) Feature subset selection using a genetic algorithm. IEEE Intell Syst 13(2):44–49

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asif Ekbal.

Additional information

Communicated by E. Lughofer.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sikdar, U.K., Ekbal, A. & Saha, S. MODE: multiobjective differential evolution for feature selection and classifier ensemble. Soft Comput 19, 3529–3549 (2015). https://doi.org/10.1007/s00500-014-1565-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-014-1565-5

Keywords

Navigation