ETL Ensembles for Chunking, NER and SRL

dos Santos, Cícero N.; Milidiú, Ruy L.; Crestana, Carlos E. M.; Fernandes, Eraldo R.

doi:10.1007/978-3-642-12116-6_9

Cícero N. dos Santos¹⁷,
Ruy L. Milidiú¹⁸,
Carlos E. M. Crestana¹⁸ &
…
Eraldo R. Fernandes^18,19

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6008))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1809 Accesses
3 Citations

Abstract

We present a new ensemble method that uses Entropy Guided Transformation Learning (ETL) as the base learner. The proposed approach, ETL Committee, combines the main ideas of Bagging and Random Subspaces. We also propose a strategy to include redundancy in transformation-based models. To evaluate the effectiveness of the ensemble method, we apply it to three Natural Language Processing tasks: Text Chunking, Named Entity Recognition and Semantic Role Labeling. Our experimental findings indicate that ETL Committee significantly outperforms single ETL models, achieving state-of-the-art competitive results. Some positive characteristics of the proposed ensemble strategy are worth to mention. First, it improves the ETL effectiveness without any additional human effort. Second, it is particularly useful when dealing with very complex tasks that use large feature sets. And finally, the resulting training and classification processes are very easy to parallelize.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Panov, P., Džeroski, S.: Combining bagging and random subspaces to create better ensembles. In: Berthold, M.R., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 118–129. Springer, Heidelberg (2007)
Chapter Google Scholar
García-Pedrajas, N., Ortiz-Boyer, D.: Boosting random subspace method. Neural Networks 21(9), 1344–1362 (2008)
Article Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Comp. Linguistics 21(4), 543–565 (1995)
Google Scholar
Milidiú, R.L., dos Santos, C.N., Duarte, J.C.: Phrase chunking using entropy guided transformation learning. In: Proceedings of ACL 2008, Columbus, Ohio (2008)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Article Google Scholar
dos Santos, C.N., Milidiú, R.L.: Entropy Guided Transformation Learning. In: Foundations of Computational Intelligence. Learning and Approximation, vol. 1. Studies in Computational Intelligence, vol. 201, pp. 159–184. Springer, Heidelberg (2009)
Chapter Google Scholar
dos Santos, C.N., Milidiú, R.L., Rentería, R.P.: Portuguese part-of-speech tagging using entropy guided transformation learning. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 143–152. Springer, Heidelberg (2008)
Chapter Google Scholar
Fernandes, E.R., Pires, B.A., dos Santos, C.N., Milidiú, R.L.: Clause identification using entropy guided transformation learning. In: Proceedings of STIL 2009 (2009)
Google Scholar
Milidiú, R.L., Crestana, C.E.M., dos Santos, C.N.: A token classification approach to dependency parsing. In: Proceedings of STIL 2009 (2009)
Google Scholar
Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: Ensemble diversity measures and their application to thinning. Information Fusion 6(1), 49–62 (2005)
Article Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Florian, R.: Transformation Based Learning and Data-Driven Lexical Disambiguation: Syntactic and Semantic Ambiguity Resolution. PhD thesis, The Johns Hopkins University (2002)
Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Chapter Google Scholar
Carreras, X., Màrquez, L.: Introduction to the conll-2004 shared task: Semantic role labeling. In: Ng, H.T., Riloff, E. (eds.) HLT-NAACL, Workshop CoNLL 2004, Boston, USA, May 2004, pp. 89–97. ACL (2004)
Google Scholar
Sang, E.F.T.K., Buchholz, S.: Introduction to the conll-2000 shared task: chunking. In: Proceedings of the 2nd workshop on Learning language in logic and the 4th CONLL, Morristown, NJ, USA, pp. 127–132. ACL (2000)
Google Scholar
Wu, Y.C., Chang, C.H., Lee, Y.S.: A general and multi-lingual phrase chunking model based on masking method. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 144–155. Springer, Heidelberg (2006)
Chapter Google Scholar
Tjong Kim Sang, E.F.: Introduction to the conll-2002 shared task: Language-independent named entity recognition. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 155–158 (2002)
Google Scholar
Carreras, X., Màrques, L., Padró, L.: Named entity extraction using adaboost. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 167–170 (2002)
Google Scholar
Surdeanu, M., Màrquez, L., Carreras, X., Comas, P.: Combination strategies for semantic role labeling. JAIR 29, 105–151 (2007)
MATH Google Scholar
Hacioglu, K., Pradhan, S.S., Ward, W.H., Martin, J.H., Jurafsky, D.: Semantic role labeling by tagging syntactic chunks. In: Ng, H.T., Riloff, E. (eds.) HLT-NAACL, Workshop CoNLL 2004, Boston, USA. ACL (May 2004)
Google Scholar
Roche, E., Schabes, Y.: Deterministic part-of-speech tagging with finite-state transducers. Comput. Linguist. 21(2), 227–253 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Mestrado em Informática Aplicada – MIA, Universidade de Fortaleza – UNIFOR, Fortaleza, Brazil
Cícero N. dos Santos
Departamento de Informática, Pontifícia Universidade Católica do Rio de Janeiro – PUC-Rio, Rio de Janeiro, Brazil
Ruy L. Milidiú, Carlos E. M. Crestana & Eraldo R. Fernandes
Laboratório de Automação, Instituto Federal de Educação, Ciência e Tecnologia de Goiás – IFG, Jataí, Brazil
Eraldo R. Fernandes

Authors

Cícero N. dos Santos
View author publications
You can also search for this author in PubMed Google Scholar
Ruy L. Milidiú
View author publications
You can also search for this author in PubMed Google Scholar
Carlos E. M. Crestana
View author publications
You can also search for this author in PubMed Google Scholar
Eraldo R. Fernandes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, 07738, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

dos Santos, C.N., Milidiú, R.L., Crestana, C.E.M., Fernandes, E.R. (2010). ETL Ensembles for Chunking, NER and SRL. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-12116-6_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12115-9
Online ISBN: 978-3-642-12116-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics