Abstract
We present here a practical method for adapting a NER system for Spanish to Portuguese. The method is based on training a machine learning algorithm, namely a C4.5, using internal and external features. The external features are provided by a NER system for Spanish, while the internal features are automatically extracted from the documents. The experimental results show that the method performs well in both languages Spanish and Portuguese.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Carreras, X., Márquez, L., Padró, L.: Named entity recognition for catalan using spanish resources. In: 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), Budapest, Hungary (April 2003)
Carreras, X., Padró, L.: A flexible distributed architecture for natural language analyzers. In: Proceedings of LREC 2002, Las Palmas de Gran Canaria, Spain (2002)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo (1993)
Witten, I.H., Frank, E.: Data Mining, Practical Machine Learning Tools and Techniques with Java Implementations. The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, San Francisco (1999)
Zhou, G., Su, J.: Named entity recognition using an hmm-based chunk tagger. In: Proceedings of ACL 2002, pp. 473–480 (2002)
Petasis, G., Cucchiarelli, A., Velardi, P., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D.: Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 128–135. ACM Press, New York (2000)
Arévalo, M., Márquez, L., Martí, M.A., Padró, L., Simón, M.J.: A proposal for wide-coverage spanish named entity recognition. Sociedad Española para el Procesamiento del Lenguaje Natural (28), 63–80 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Solorio, T., López, A.L. (2005). Learning Named Entity Recognition in Portuguese from Spanish. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_85
Download citation
DOI: https://doi.org/10.1007/978-3-540-30586-6_85
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)