Abstract
The automatic part-of-speech tagging is the process of automatically assigning to the words of a text a part-of-speech (POS) tag. The words of a language are grouped into grammatical categories that represent the function that they might have in a sentence. These grammatical classes (or categories) are usually called part-of-speech. However, in most languages, there are a large number of words that can be used in different ways, thus having more than one possible part-of-speech. To choose the right tag for a particular word, a POS tagger must consider the surrounding words’ part-of-speeches. The neighboring words could also have more than one possible way to be tagged. This means that, in order to solve the problem, we need a method to disambiguate a word’s possible tags set. In this work, we modeled the part-of-speech tagging problem as a combinatorial optimization problem, which we solve using a genetic algorithm. The search for the best combinatorial solution is guided by a set of disambiguation rules that we first discovered using a classification algorithm, that also includes a genetic algorithm. Using rules to disambiguate the tagging, we were able to generalize the context information present on the training tables adopted by approaches based on probabilistic data. We were also able to incorporate other type of information that helps to identify a word’s grammatical class. The results obtained on two different corpora are amongst the best ones published.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Steven Bird, E.K., Loper, E.: Natural Language Processing with Python. O’Reilly Media (2009)
Brants, T.: Tnt: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, ANLC 2000, pp. 224–231. Association for Computational Linguistics, Stroudsburg (2000)
Araujo, L.: Part-of-speech tagging with evolutionary algorithms. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 230–239. Springer, Heidelberg (2002)
Araujo, L.: Symbiosis of evolutionary techniques and statistical natural language processing. IEEE Transactions on Evolutionary Computation 8, 14–27 (2004)
Araujo, L.: How evolutionary algorithms are applied to statistical natural language processing. Artificial Intelligence Review 28, 275–303 (2007)
Araujo, L., Luque, G., Alba, E.: Metaheuristics for natural language tagging. In: Deb, K., Tari, Z. (eds.) GECCO 2004. LNCS, vol. 3102, pp. 889–900. Springer, Heidelberg (2004)
Alba, E., Luque, G., Araujo, L.: Natural language tagging with genetic algorithms. Information Processing Letters 100, 173–182 (2006)
Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21, 543–565 (1995)
Wilson, G., Heywood, M.: Use of a genetic algorithm in brill’s transformation-based part-of-speech tagger. In: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation, GECCO 2005, pp. 2067–2073. ACM, New York (2005)
Nogueira Dos Santos, C., Milidiú, R.L., Rentería, R.P.: Portuguese part-of-speech tagging using entropy guided transformation learning. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 143–152. Springer, Heidelberg (2008)
Manning, C., Schütze, H.: Foundation of Statistical Natural Language Processing. MIT Press, Cambridge (2000)
Charniak, E.: Statistical Language Learning. MIT Press, Cambridge (1993)
Freitas, A.A.,, I.: A survey of evolutionary algorithms for data mining and knowledge discovery, pp. 819–845. Springer-Verlag New York, Inc., New York (2003)
Greene, D.P., Smith, S.F.: Competition-based induction of decision models from examples. Machine Learning 13, 229–257 (1993)
Giordana, A., Neri, F.: Search-intensive concept induction. Evol. Comput. 3, 375–416 (1995)
de Jong, K.A., Spears, W.M., Gordon, D.F.: Using genetic algorithms for concept learning. Machine Learning 13, 161–188 (1993), doi:10.1023/A:1022617912649
Janikow, C.Z.: A knowledge-intensive genetic algorithm for supervised learning. Machine Learning 13, 189–228 (1993), doi:10.1007/BF00993043
Poli, R.: A simple but theoretically-motivated method to control bloat in genetic programming. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 43–76. Springer, Heidelberg (2003)
Noda, E., Freitas, A., Lopes, H.: Discovering interesting prediction rules with a genetic algorithm. In: Proceedings of the 1999 Congress on Evolutionary Computation, CEC 1999, vol. 2, 3 vol. (xxxvii+2348) (1999)
Nelson, F.W., Kučera, H.: Manual of information to accompany a standard corpus of present-day edited american english, for use with digital computers. Technical report, Dep. of Linguistics, Brown University (1979)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19, 313–330 (1994)
Hindle, D.: Acquiring disambiguation rules from text (1989)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Silva, A.P., Silva, A., Rodrigues, I. (2015). An Approach to the POS Tagging Problem Using Genetic Algorithms. In: Madani, K., Correia, A., Rosa, A., Filipe, J. (eds) Computational Intelligence. IJCCI 2012. Studies in Computational Intelligence, vol 577. Springer, Cham. https://doi.org/10.1007/978-3-319-11271-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-11271-8_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11270-1
Online ISBN: 978-3-319-11271-8
eBook Packages: EngineeringEngineering (R0)