Skip to main content

ETL Ensembles for Chunking, NER and SRL

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2010)

Abstract

We present a new ensemble method that uses Entropy Guided Transformation Learning (ETL) as the base learner. The proposed approach, ETL Committee, combines the main ideas of Bagging and Random Subspaces. We also propose a strategy to include redundancy in transformation-based models. To evaluate the effectiveness of the ensemble method, we apply it to three Natural Language Processing tasks: Text Chunking, Named Entity Recognition and Semantic Role Labeling. Our experimental findings indicate that ETL Committee significantly outperforms single ETL models, achieving state-of-the-art competitive results. Some positive characteristics of the proposed ensemble strategy are worth to mention. First, it improves the ETL effectiveness without any additional human effort. Second, it is particularly useful when dealing with very complex tasks that use large feature sets. And finally, the resulting training and classification processes are very easy to parallelize.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Panov, P., Džeroski, S.: Combining bagging and random subspaces to create better ensembles. In: Berthold, M.R., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 118–129. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  2. García-Pedrajas, N., Ortiz-Boyer, D.: Boosting random subspace method. Neural Networks 21(9), 1344–1362 (2008)

    Article  Google Scholar 

  3. Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Comp. Linguistics 21(4), 543–565 (1995)

    Google Scholar 

  4. Milidiú, R.L., dos Santos, C.N., Duarte, J.C.: Phrase chunking using entropy guided transformation learning. In: Proceedings of ACL 2008, Columbus, Ohio (2008)

    Google Scholar 

  5. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  6. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)

    Article  Google Scholar 

  7. dos Santos, C.N., Milidiú, R.L.: Entropy Guided Transformation Learning. In: Foundations of Computational Intelligence. Learning and Approximation, vol. 1. Studies in Computational Intelligence, vol. 201, pp. 159–184. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  8. dos Santos, C.N., Milidiú, R.L., Rentería, R.P.: Portuguese part-of-speech tagging using entropy guided transformation learning. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 143–152. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. Fernandes, E.R., Pires, B.A., dos Santos, C.N., Milidiú, R.L.: Clause identification using entropy guided transformation learning. In: Proceedings of STIL 2009 (2009)

    Google Scholar 

  10. Milidiú, R.L., Crestana, C.E.M., dos Santos, C.N.: A token classification approach to dependency parsing. In: Proceedings of STIL 2009 (2009)

    Google Scholar 

  11. Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: Ensemble diversity measures and their application to thinning. Information Fusion 6(1), 49–62 (2005)

    Article  Google Scholar 

  12. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  13. Florian, R.: Transformation Based Learning and Data-Driven Lexical Disambiguation: Syntactic and Semantic Ambiguity Resolution. PhD thesis, The Johns Hopkins University (2002)

    Google Scholar 

  14. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  15. Carreras, X., Màrquez, L.: Introduction to the conll-2004 shared task: Semantic role labeling. In: Ng, H.T., Riloff, E. (eds.) HLT-NAACL, Workshop CoNLL 2004, Boston, USA, May 2004, pp. 89–97. ACL (2004)

    Google Scholar 

  16. Sang, E.F.T.K., Buchholz, S.: Introduction to the conll-2000 shared task: chunking. In: Proceedings of the 2nd workshop on Learning language in logic and the 4th CONLL, Morristown, NJ, USA, pp. 127–132. ACL (2000)

    Google Scholar 

  17. Wu, Y.C., Chang, C.H., Lee, Y.S.: A general and multi-lingual phrase chunking model based on masking method. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 144–155. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. Tjong Kim Sang, E.F.: Introduction to the conll-2002 shared task: Language-independent named entity recognition. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 155–158 (2002)

    Google Scholar 

  19. Carreras, X., Màrques, L., Padró, L.: Named entity extraction using adaboost. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 167–170 (2002)

    Google Scholar 

  20. Surdeanu, M., Màrquez, L., Carreras, X., Comas, P.: Combination strategies for semantic role labeling. JAIR 29, 105–151 (2007)

    MATH  Google Scholar 

  21. Hacioglu, K., Pradhan, S.S., Ward, W.H., Martin, J.H., Jurafsky, D.: Semantic role labeling by tagging syntactic chunks. In: Ng, H.T., Riloff, E. (eds.) HLT-NAACL, Workshop CoNLL 2004, Boston, USA. ACL (May 2004)

    Google Scholar 

  22. Roche, E., Schabes, Y.: Deterministic part-of-speech tagging with finite-state transducers. Comput. Linguist. 21(2), 227–253 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

dos Santos, C.N., Milidiú, R.L., Crestana, C.E.M., Fernandes, E.R. (2010). ETL Ensembles for Chunking, NER and SRL. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12116-6_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12115-9

  • Online ISBN: 978-3-642-12116-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics