Abstract
As a fast and high-quality tagger algorithm is a crucial task in natural language processing, this paper presents novel language-independent algorithms based on harmony search (HS) optimization method for handling the part-of-speech (PoS) tagging problem. The first proposed algorithm is a framework for applying HS to PoS-tagging which is called HSTAGger. By modifying HS algorithm and proposing more efficient objective functions, two improved versions of the HSTAGger are also introduced. In addition, a novel class of problematic words called erroneous as well as a method of handling them is proposed for the first time to the best of our knowledge. To demonstrate the effectiveness of the proposed algorithms, we have applied them on standard annotated corpus and compare them with other evolutionary-based and classical PoS-tagging approaches. Experimental results indicate that the proposed algorithms outperform the other taggers previously presented in the literature in terms of average precision.










Similar content being viewed by others
Notes
We have used the values proposed by [6] for crossover and mutation parameters.
We have used our own implementations of the Random and LexProb algorithms for evaluation.
Java implementation of this system is freely available at ftp://ftp.cis.upenn.edu/pub/adwait/jmx/.
The package is available at http://research.microsoft.com/en-us/downloads/25e1ecf0-8cfa-4106-ba25-51b0d501017d/default.aspx.
References
Attia M, Rashwan MAA, Al-Badrashiny MASAA (2009) Fassieh (R), a semi-automatic visual interactive tool for morphological, pos-tags, phonetic, and semantic annotation of arabic text corpora. IEEE Trans Audio Speech Lang Process 17:916–925
Baeza-Yate R, Ribeiro BN (1999) Modern information retrieval. ACM Press; Addison-Wesley, New York Harlow, England
DeRose SJ (1988) Grammatical category disambiguation by statistical optimization. Comput Linguist 14:31–39
Francis WN, Kucera H (1979) Manual of information to accompany a standard corpus of present-day edited american english, for use with digital computers. Brown University, Providence
Brants T (2000) TnT: a statistical part-of-speech tagger, presented at the Proceedings of the sixth conference on Applied natural language processing, Seattle, Washington
Alba E, Luque G, Araujo L (2006) Natural language tagging with genetic algorithms. Inform Process Lett 100:173–182
Geem ZW, Tseng CL, Park Y (2005) Proceedings Harmony search for generalized orienteering problem: best touring in china. Adv Nat Comput Pt 3 3612:741–750
Geem ZW (2008) Novel derivative of harmony search algorithm for discrete design variables. Appl Math Comput 199:223–230
Geem ZW (2009) Music-inspired harmony search algorithm : theory and applications, 1st edn. Springer, New York
Forsati R, Mahdavi M, Haghighat AT, Ghariniyat A (2008) An efficient algorithm for bandwidth-delay constrained least cost multicast routing. Can Conf Elect Comput Eng CCECE 2008:1641–1646
Forsati R, Mahdavi M, Shamsfard M, Meybodi MR (2013) Efficient stochastic algorithms for document clustering. Inform Sci 22:269–291. doi:10.1016/j.ins.2012.07.025
Forsati R, Mahdavi M, Haghigaht AT (2008) Harmony search based algorithms for bandwidth-delay-constrained least-cost multicast routing. Comput Commun 31:2505–2519
Mahdavi M, Haghir Chehreghani M, Abolhassani H, Forsati R (2008) Novel meta-heuristic algorithms for clustering web documents. Appl Math Comput 201:441–451
Mirkhani M, Forsati R, Shahri M, Moayedikia A (2013) A novel efficient algorithm for mobile robot localization. Robot Auton Syst 61:920–931
Lee KS, Geem ZW (2005) A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice. Comput Methods Appl Mech Eng 194:3902–3933
Forsati R, Shamsfard M, Mojtahedpour P (2010) An Efficient meta heuristic algorithm for POS-tagging. 2010 Fifth International Multi-conference on Computing in the Global Information Technology, Spain
Forney GD (1973) The Viterbi algorithm. Proc IEEE 61:268–278
Brill E (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput Linguist 21:543–565
Araujo L (2003) Studying the advantages of a messy evolutionary algorithm for natural language tagging, presented at the Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII, USA.
Teller V (2000) Speech and language processing: An introduction to natural language processing, computational linguistics and speech recognition. Comput Linguistics 26:638–641
Araujo L (2004) Symbiosis of evolutionary techniques and statistical natural language processing. IEEE Trans Evol Comput 8:14–27
Jurafsky D, Martin JH (2009) Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edn. Pearson Prentice Hall, Upper Saddle River
van Halteren H, Daelemans W, Zavrel J (2001) Improving accuracy in word class tagging through the combination of machine learning systems. Comput Linguist 27:199–229
Schütze H, Singer Y. (1994). Part-of-speech tagging using a variable memory markov model, presented at the Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Las Cruces, New Mexico.
Merialdo B (1994) Tagging English text with a probabilistic model. Comput Linguist 20:155–171
Araujo L, (2002). Part-of-Speech tagging with evolutionary algorithms, presented at the Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing.
Sarikaya R, Afify M, Deng Y, Erdogan H, Gao Y (2008) Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic. IEEE Trans Audio Speech Lang Process 16:1330–1339
Schmid H. (1994). Part-of-speech tagging with neural networks, presented at the Proceedings of the 15th conference on Computational linguistics, vol. 1, Kyoto, Japan.
Kudo T, Yamamoto K, Matsumoto Y. (2004). “Applying conditional random fields to Japanese morphological analysis”, presented at the In Proc. of EMNLP’04, Barcelona, Spain.
Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge
Charniak E (1993) Statistical language learning. MIT Press, Cambridge
Halteren HV, Zavrel J, Daelemans W. (1998). Improving data driven wordclass tagging by system combination, presented at the Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 1, Montreal, Quebec, Canada.
Martin Volk GS. (1998). Comparing a statistical and a rule-based tagger for German, presented at the In Proceedings of KONVENS-98, Bonn.
Brill E (1995) Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Comput Linguistics 21(4):543–565
Lua KT (1996) Part of speech tagging of Chinese sentences using genetic algorithm, presented at the Conference on Chinese Computing.
Araujo L (2003) Studying the advantages of a messy evolutionary algorithm for natural language tagging. Genetic and Evolutionary Computation - Gecco 2003, Pt Ii, Proceedings 2724:1951–1962
Jelinek F (1985) Markov source modeling of text generation. Skwirzynski JK (ed) The Impact of Processing Techniques on Communication. Nijhoff, Dordrecht, The, Netherlands
Carlberger J, Kann V (1999) Implementing an efficient part-of-speech tagger. Softw. Pract. Exper. 29:815–832
Lee GG, Cha J, Lee JH (2002) Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean. Comput. Linguist. 28:53–70
Pan QK, Suganthan PN, Tasgetiren MF (2010) A local-best harmony search algorithm with dynamic subpopulations. Engineering Optimization 42:101–117
Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in genetic algorithms. Morgan Kaufmann Publishers Inc, Burlington
[RSOnline] http://khnt.hit.uib.no/icame/manuals/brown/INDEX.HTM
Araujo L, Luque G, Alba E (2004) Metaheuristics for natural language tagging. In: Deb K et al (eds) Genetic and Evolutionary Computation Conference (GECCO-2004) Seattle, Washington, in: Lecture Notes in Computer Science, vol 3102., Springer, Berlin, pp 889–900
Marcus MP, Santorini B, Marcinkiewicz MA (1994) Building a large annotated corpus of English: The Penn Treebank, Computational Linguistics 19:313–330
Geem ZW (2006) Optimal cost design of water distribution networks using harmony search. Engineering Optimization 38:259–277
Kim JH, Geem ZW, Kim ES (2001) Parameter estimation of the nonlinear muskingum model using harmony search. J. Am. Water Resour. Assoc. 37:1131–1138
Forsati R, Mahdavi M (2010) Web text mining using harmony search. Recent Advances In Harmony Search Algorithm 2010:51–64
Rosenfeld R (1996) A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language 10:187–228
Aone C, Hausman K. (1996). “Unsupervised learning of a rule-based Spanish part of speech tagger”, presented at the Proceedings of the 16th conference on Computational linguistics, vol. 1, Copenhagen, Denmark.
Daelemans W, Zavrel J, Berck P, Gillis S. (1996) “MBT: A memory-based part-of speech tagger generator”, Proceedings 4th Workshop on Very Large Corpora, pp. 14–27.
Gao J, Johnson M. (2008) “A comparison of Bayesian estimators for unsupervised hidden Markov model POS taggers”, 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 344–352.
Gamback B, Olsson F, Argaw AA, Asker L. (2009). “Methods for amharic part-of-speech tagging”, Proceedings of the First Workshop on Language Technologies for African Languages (AfLaT 2009), Greece: Association for Computational Linguistics, pp. 104–111.
Giménez J, Màrquez L. (2004). “SVMTool: A general POS tagger generator based on Support Vector Machines”, In Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 43–46, Lisbon, Portugal.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Forsati, R., Shamsfard, M. Novel harmony search-based algorithms for part-of-speech tagging. Knowl Inf Syst 42, 709–736 (2015). https://doi.org/10.1007/s10115-013-0719-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0719-6