Skip to main content
Log in

Novel harmony search-based algorithms for part-of-speech tagging

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

As a fast and high-quality tagger algorithm is a crucial task in natural language processing, this paper presents novel language-independent algorithms based on harmony search (HS) optimization method for handling the part-of-speech (PoS) tagging problem. The first proposed algorithm is a framework for applying HS to PoS-tagging which is called HSTAGger. By modifying HS algorithm and proposing more efficient objective functions, two improved versions of the HSTAGger are also introduced. In addition, a novel class of problematic words called erroneous as well as a method of handling them is proposed for the first time to the best of our knowledge. To demonstrate the effectiveness of the proposed algorithms, we have applied them on standard annotated corpus and compare them with other evolutionary-based and classical PoS-tagging approaches. Experimental results indicate that the proposed algorithms outperform the other taggers previously presented in the literature in terms of average precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. We have used the values proposed by [6] for crossover and mutation parameters.

  2. We have used our own implementations of the Random and LexProb algorithms for evaluation.

  3. Java implementation of this system is freely available at ftp://ftp.cis.upenn.edu/pub/adwait/jmx/.

  4. The package is available at http://research.microsoft.com/en-us/downloads/25e1ecf0-8cfa-4106-ba25-51b0d501017d/default.aspx.

  5. www.lsi.upc.edu/_nlp/SVMTool.

References

  1. Attia M, Rashwan MAA, Al-Badrashiny MASAA (2009) Fassieh (R), a semi-automatic visual interactive tool for morphological, pos-tags, phonetic, and semantic annotation of arabic text corpora. IEEE Trans Audio Speech Lang Process 17:916–925

    Article  Google Scholar 

  2. Baeza-Yate R, Ribeiro BN (1999) Modern information retrieval. ACM Press; Addison-Wesley, New York Harlow, England

    Google Scholar 

  3. DeRose SJ (1988) Grammatical category disambiguation by statistical optimization. Comput Linguist 14:31–39

    Google Scholar 

  4. Francis WN, Kucera H (1979) Manual of information to accompany a standard corpus of present-day edited american english, for use with digital computers. Brown University, Providence

    Google Scholar 

  5. Brants T (2000) TnT: a statistical part-of-speech tagger, presented at the Proceedings of the sixth conference on Applied natural language processing, Seattle, Washington

  6. Alba E, Luque G, Araujo L (2006) Natural language tagging with genetic algorithms. Inform Process Lett 100:173–182

    Article  MATH  MathSciNet  Google Scholar 

  7. Geem ZW, Tseng CL, Park Y (2005) Proceedings Harmony search for generalized orienteering problem: best touring in china. Adv Nat Comput Pt 3 3612:741–750

    Google Scholar 

  8. Geem ZW (2008) Novel derivative of harmony search algorithm for discrete design variables. Appl Math Comput 199:223–230

    Article  MATH  MathSciNet  Google Scholar 

  9. Geem ZW (2009) Music-inspired harmony search algorithm : theory and applications, 1st edn. Springer, New York

    Book  Google Scholar 

  10. Forsati R, Mahdavi M, Haghighat AT, Ghariniyat A (2008) An efficient algorithm for bandwidth-delay constrained least cost multicast routing. Can Conf Elect Comput Eng CCECE 2008:1641–1646

    Google Scholar 

  11. Forsati R, Mahdavi M, Shamsfard M, Meybodi MR (2013) Efficient stochastic algorithms for document clustering. Inform Sci 22:269–291. doi:10.1016/j.ins.2012.07.025

    Google Scholar 

  12. Forsati R, Mahdavi M, Haghigaht AT (2008) Harmony search based algorithms for bandwidth-delay-constrained least-cost multicast routing. Comput Commun 31:2505–2519

    Article  Google Scholar 

  13. Mahdavi M, Haghir Chehreghani M, Abolhassani H, Forsati R (2008) Novel meta-heuristic algorithms for clustering web documents. Appl Math Comput 201:441–451

    Article  MATH  MathSciNet  Google Scholar 

  14. Mirkhani M, Forsati R, Shahri M, Moayedikia A (2013) A novel efficient algorithm for mobile robot localization. Robot Auton Syst 61:920–931

    Article  Google Scholar 

  15. Lee KS, Geem ZW (2005) A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice. Comput Methods Appl Mech Eng 194:3902–3933

    Article  MATH  Google Scholar 

  16. Forsati R, Shamsfard M, Mojtahedpour P (2010) An Efficient meta heuristic algorithm for POS-tagging. 2010 Fifth International Multi-conference on Computing in the Global Information Technology, Spain

  17. Forney GD (1973) The Viterbi algorithm. Proc IEEE 61:268–278

    Article  MathSciNet  Google Scholar 

  18. Brill E (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput Linguist 21:543–565

    Google Scholar 

  19. Araujo L (2003) Studying the advantages of a messy evolutionary algorithm for natural language tagging, presented at the Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII, USA.

  20. Teller V (2000) Speech and language processing: An introduction to natural language processing, computational linguistics and speech recognition. Comput Linguistics 26:638–641

    Article  Google Scholar 

  21. Araujo L (2004) Symbiosis of evolutionary techniques and statistical natural language processing. IEEE Trans Evol Comput 8:14–27

    Article  Google Scholar 

  22. Jurafsky D, Martin JH (2009) Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edn. Pearson Prentice Hall, Upper Saddle River

    Google Scholar 

  23. van Halteren H, Daelemans W, Zavrel J (2001) Improving accuracy in word class tagging through the combination of machine learning systems. Comput Linguist 27:199–229

    Article  Google Scholar 

  24. Schütze H, Singer Y. (1994). Part-of-speech tagging using a variable memory markov model, presented at the Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Las Cruces, New Mexico.

  25. Merialdo B (1994) Tagging English text with a probabilistic model. Comput Linguist 20:155–171

    Google Scholar 

  26. Araujo L, (2002). Part-of-Speech tagging with evolutionary algorithms, presented at the Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing.

  27. Sarikaya R, Afify M, Deng Y, Erdogan H, Gao Y (2008) Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic. IEEE Trans Audio Speech Lang Process 16:1330–1339

    Article  Google Scholar 

  28. Schmid H. (1994). Part-of-speech tagging with neural networks, presented at the Proceedings of the 15th conference on Computational linguistics, vol. 1, Kyoto, Japan.

  29. Kudo T, Yamamoto K, Matsumoto Y. (2004). “Applying conditional random fields to Japanese morphological analysis”, presented at the In Proc. of EMNLP’04, Barcelona, Spain.

  30. Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge

    MATH  Google Scholar 

  31. Charniak E (1993) Statistical language learning. MIT Press, Cambridge

    Google Scholar 

  32. Halteren HV, Zavrel J, Daelemans W. (1998). Improving data driven wordclass tagging by system combination, presented at the Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 1, Montreal, Quebec, Canada.

  33. Martin Volk GS. (1998). Comparing a statistical and a rule-based tagger for German, presented at the In Proceedings of KONVENS-98, Bonn.

  34. Brill E (1995) Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Comput Linguistics 21(4):543–565

    Google Scholar 

  35. Lua KT (1996) Part of speech tagging of Chinese sentences using genetic algorithm, presented at the Conference on Chinese Computing.

  36. Araujo L (2003) Studying the advantages of a messy evolutionary algorithm for natural language tagging. Genetic and Evolutionary Computation - Gecco 2003, Pt Ii, Proceedings 2724:1951–1962

    Article  Google Scholar 

  37. Jelinek F (1985) Markov source modeling of text generation. Skwirzynski JK (ed) The Impact of Processing Techniques on Communication. Nijhoff, Dordrecht, The, Netherlands

  38. Carlberger J, Kann V (1999) Implementing an efficient part-of-speech tagger. Softw. Pract. Exper. 29:815–832

    Article  Google Scholar 

  39. Lee GG, Cha J, Lee JH (2002) Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean. Comput. Linguist. 28:53–70

    Article  Google Scholar 

  40. Pan QK, Suganthan PN, Tasgetiren MF (2010) A local-best harmony search algorithm with dynamic subpopulations. Engineering Optimization 42:101–117

    Article  Google Scholar 

  41. Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in genetic algorithms. Morgan Kaufmann Publishers Inc, Burlington

    Google Scholar 

  42. [RSOnline] http://khnt.hit.uib.no/icame/manuals/brown/INDEX.HTM

  43. Araujo L, Luque G, Alba E (2004) Metaheuristics for natural language tagging. In: Deb K et al (eds) Genetic and Evolutionary Computation Conference (GECCO-2004) Seattle, Washington, in: Lecture Notes in Computer Science, vol 3102., Springer, Berlin, pp 889–900

  44. Marcus MP, Santorini B, Marcinkiewicz MA (1994) Building a large annotated corpus of English: The Penn Treebank, Computational Linguistics 19:313–330

    Google Scholar 

  45. Geem ZW (2006) Optimal cost design of water distribution networks using harmony search. Engineering Optimization 38:259–277

    Article  Google Scholar 

  46. Kim JH, Geem ZW, Kim ES (2001) Parameter estimation of the nonlinear muskingum model using harmony search. J. Am. Water Resour. Assoc. 37:1131–1138

    Article  Google Scholar 

  47. Forsati R, Mahdavi M (2010) Web text mining using harmony search. Recent Advances In Harmony Search Algorithm 2010:51–64

    Article  Google Scholar 

  48. Rosenfeld R (1996) A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language 10:187–228

    Article  Google Scholar 

  49. Aone C, Hausman K. (1996). “Unsupervised learning of a rule-based Spanish part of speech tagger”, presented at the Proceedings of the 16th conference on Computational linguistics, vol. 1, Copenhagen, Denmark.

  50. Daelemans W, Zavrel J, Berck P, Gillis S. (1996) “MBT: A memory-based part-of speech tagger generator”, Proceedings 4th Workshop on Very Large Corpora, pp. 14–27.

  51. Gao J, Johnson M. (2008) “A comparison of Bayesian estimators for unsupervised hidden Markov model POS taggers”, 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 344–352.

  52. Gamback B, Olsson F, Argaw AA, Asker L. (2009). “Methods for amharic part-of-speech tagging”, Proceedings of the First Workshop on Language Technologies for African Languages (AfLaT 2009), Greece: Association for Computational Linguistics, pp. 104–111.

  53. Giménez J, Màrquez L. (2004). “SVMTool: A general POS tagger generator based on Support Vector Machines”, In Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 43–46, Lisbon, Portugal.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rana Forsati.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Forsati, R., Shamsfard, M. Novel harmony search-based algorithms for part-of-speech tagging. Knowl Inf Syst 42, 709–736 (2015). https://doi.org/10.1007/s10115-013-0719-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0719-6

Keywords

Navigation