Corpus based part-of-speech tagging

Lv, Chengyao; Liu, Huihua; Dong, Yuanxing; Chen, Yunliang

doi:10.1007/s10772-016-9356-2

Corpus based part-of-speech tagging

Published: 01 August 2016

Volume 19, pages 647–654, (2016)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Chengyao Lv¹,
Huihua Liu¹,
Yuanxing Dong¹ &
…
Yunliang Chen^1,2

688 Accesses
3 Citations
Explore all metrics

Abstract

In natural language processing, a crucial subsystem in a wide range of applications is a part-of-speech (POS) tagger, which labels (or classifies) unannotated words of natural language with POS labels corresponding to categories such as noun, verb or adjective. Mainstream approaches are generally corpus-based: a POS tagger learns from a corpus of pre-annotated data how to correctly tag unlabeled data. Presented here is a brief state-of-the-art account on POS tagging. POS tagging approaches make use of labeled corpus to train computational trained models. Several typical models of three kings of tagging are introduced in this article: rule-based tagging, statistical approaches and evolution algorithms. The advantages and the pitfalls of each typical tagging are discussed and analyzed. Some rule-based and stochastic methods have been successfully achieved accuracies of 93–96 %, while that of some evolution algorithms are about 96–97 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Part-of-Speech Tagging Using Evolutionary Computation

“Part of Speech Tagging – A Corpus Based Approach”

PSO-Tagger: A New Biologically Inspired Approach to the Part-of-Speech Tagging Problem

References

Araujo, L. (2001). Evolutionary parsing for a probabilistic context free grammar. In Rough sets and current trends in computing, Canada (pp. 590–597). Berlin: Springer.
Araujo, L. (2002). Part-of-speech tagging with evolutionary algorithms. In Third International conference on computational linguistics and intelligent text processing, Mexico City, Mexico (pp. 187–203).
Bohnet, B., & Nivre, J. (2012). A transition-based system for joint part-of-speech tagging and labeled non-projective dependency parsing. In Joint conference on empirical methods in natural language processing & computational natural language learning, Jeju Island, Korea (pp. 1455–1465).
Brants, T. (2000). TnT: a statistical part-of-speech tagger. In Proceedings of the sixth applied natural language processing conference, Seattle, WA (pp. 224–231). Trento: Association for Computational Linguistics.
Brill, E. (1992). A simple rule-based part of speech tagger. In Proceedings of the third conference on applied computational linguistics (pp. 112–116). Trento: Association for Computational Linguistics.
Brill, E. (1995). Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Computational Linguistics, 21(4), 543–565.
Google Scholar
Carlberger, J., & Kann, V. (1999). Implementing an efficient part-of-speech tagger. Software-Practice and Experience, 29(9), 815–832.
Article Google Scholar
Charniak, E., Hendrickson, C., et al. (1993). Equations for part-of-speech tagging. In AAAI-93, Proceedings (pp. 784–784). New York: Wiley.
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines: and other kernel-based learning methods. Cambridge: Cambridge University Press.
Book MATH Google Scholar
Cutting, D., Kupiec, J., et al. (1992). A practical part-of-speech tagger (pp. 133–140). Trendo: Association for Computational Linguistics.
Google Scholar
Davis, M., & Dunning, T. (1995). Query translation using evolutionary programming for multi-lingual information retrieval. In Proceedings of the fourth annual conference on evolutionary programming (pp. 175–185).
Ferreira, C. (2001). Gene expression programming: a new adaptive algorithm for solving problems. Arxiv preprint cs/0102027.
Ferreira, C. (2003). Function finding and the creation of numerical constants in gene expression programming. In Advances in soft computing, 265.
Garrette, D., & Baldridge, J. (2013). Learning a part-of-speech tagger from two hours of annotation. In Proceedings of NAACL, Atlanta, Georgia (pp. 129–134).
Giménez, J., & Marquez, L. (2004). SVMTool: A general POS tagger generator based on support vector machines. In Proceedings of the 4th international conference on language resources and evaluation (LREC’04), Citeseer.
Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Addison: Wesley.
Greene, B. B., & Rubin, G. M. (1971). Automatic grammatical tagging of English. Department of Linguistics, Brown University.
Jamatia, A., Gamblack, B., & Das, A. (2015). Part-of-speech tagging for code-mixed english-hindi twitter and facebook chat messages. In Proceedings of recent advances in natural language processing (pp. 239–248). Hissar.
Jing, P., Changjie, T., et al. (2005). M-GEP: a new evolution algorithm based on multi-layer chromosomes gene expression programming. Chinese Journal of Computers, 28(9), 1459–1466.
Google Scholar
Karakasis, V. K., & Stafylopatis, A. (2008). Efficient evolution of accurate classification rules using a combination of gene expression programming and clonal selection. IEEE Transactions on Evolutionary Computation, 12(6), 662–678.
Article Google Scholar
Karkaletsis, G., Petasis, G., & Paliouras, V. (2015). Using machine learning techniques for part-of-speech tagging in the Greek language. Singapore: World Scientific Publishing Company.
Google Scholar
Kempe, A. (1993). A probabilistic tagger and an analysis of tagging errors. Rapport technique, Institut für maschinelle sprachverarbeitung, Universität stuttgart.
Krovetz, R. (1997). Homonymy and polysemy in information retrieval. In Meeting of the Association for Computational Linguistics (pp. 72–79). Trendo: Association for Computational Linguistics.
Lee, S. Z., Tsujii, J. I., & Rim, H. C. (2000). Lexicalized hidden markov models for part-of-speech tagging. In International conference on computational linguistics (pp. 481–487). Trendo: Association for Computational Linguistics.
Lippmann, R. P. (1989). Review of neural networks for speech recognition. Neural Computation, 1(1), 1–38.
Article Google Scholar
Lv, C., Liu, H., et al. (2010). An efficient corpus based part-of-speech tagging with GEP. In Sixth international conference on semantics, knowledge and grids (pp. 289–292). IEEE.
Magerman, D. M. (1995). Statistical decision-tree models for parsing. In Meeting of the Association for Computational Linguistics (pp. 276–283). Trendo: Association for Computational Linguistics.
Manning, C. D., Schütze, H., et al. (1999). Foundations of statistical natural language processing. Cambridge: MIT Press.
MATH Google Scholar
Marques, N., & Lopes, G. (2001). Tagging with small training corpora. In International symposium on advances in intelligent data analysis (pp. 63–72). Berlin: Springer.
Màrquez, L., Padro, L., et al. (2000). A machine learning approach to POS tagging. Machine Learning, 39(1), 59–91.
Article MATH Google Scholar
Martinez, A. R. (2012). Part-of-speech tagging. Wiley Interdisciplinary Reviews, 4(1), 107–113.
Article Google Scholar
Merialdo, B. (1994). Tagging English text with a probabilistic model. Computational Linguistics, 20(2), 155–171.
Google Scholar
Nakagawa, T., Kudoh, T., et al. (2001). Unknown word guessing and part-of-speech tagging using support vector machines. In Proceedings of the sixth natural language processing pacific rim symposium (pp. 325–331).
Nakamura, M., Maruyama, K., et al. (1990). Neural network approach to word category prediction for English texts. In International conference on computational linguistics (pp. 213–218). Trendo: Association for Computational Linguistics.
Ngai, G., & Florian, R. (2001). Transformation-based learning in the fast lane. In Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies (pp. 1–8).
Owoputi, O., O’Connor, B., & Dyer, C. (2013). Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of NAACL, Atlanta (pp. 380–390).
Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of the IEEE (vol. 77(2), pp. 257–286).
Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. In Proceedings of EMNLP’1996, New Brunswick, New Jersey (vol. 1, pp. 133–142).
Sánchez-Villamil, E., Forcada, M., et al. (2004). Unsupervised training of a finite-state sliding-window part-of-speech tagger. EsTAL, 2004, 454–463.
Google Scholar
Schmid, H. (1994). Part-of-speech tagging with neural networks. In International conference on computational linguistics (pp. 172–176). Trendo: Association for Computational Linguistics.
Smith, T. C., & Witten, I. H. (1995). A genetic algorithm for the induction of natural language grammars. In Proc IJCAI-95 workshop on new approaches to learning for natural language processing (pp. 17–24).
Sun, G., Lang, F., & Qiao P. (2008). Chinese part-of-speech tagging based on fusion model. In Proceedings of the 11th joint conference on information sciences. Amsterdam: Atlantis Press.
Thede, S. M., & Harper, M. P. (1999). A second-order hidden Markov model for part-of-speech tagging. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics (pp. 175–182).
Varile, G. B., & Zampolli, A. (1997). Survey of the state of the art in human language technology. Cambridge: Cambridge University Press.
Google Scholar
Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269.
Article MATH Google Scholar
Voutilainen, A. (2003). Part-of-speech tagging. The Oxford handbook of computational linguistics (pp. 219–232).
Wilks, Y., & Stevenson, M. (2000). Combining independent knowledge sources for word sense disambiguation. Amsterdam Studies in the Theory and History of Linguistic Science Series, 4, 117–130.
Google Scholar
Tian, Y., & Lo, D. (2015). A comparative study on the effectiveness of part-of-speech tagging techniques on bug reports. In International conference on software analysis, evolution and reengineering (pp. 570–574). Montréal.
Zhou, C., Xiao, W., et al. (2003). Evolving accurate and compact classification rules with gene expression programming. IEEE Transactions on Evolutionary Computation, 7(6), 519–531.
Article Google Scholar
Zuo, J., Tang, C., et al. (2002). Mining predicate association rule by gene expression programming. In Advances in web-age information management (pp. 281–294).
Zuo, J., Tang, C., et al. (2004). Time series prediction based on gene expression programming. In Advances in web-age information management (pp. 55–64).

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Nos. 61440018, 61501411), the Hubei Natural Science Foundation (No. 2014CFB904).

Author information

Authors and Affiliations

School of Foreign Language, China University of Geosciences, Wuhan, 430074, China
Chengyao Lv, Huihua Liu, Yuanxing Dong & Yunliang Chen
School of Computer Science, China University of Geosciences, Wuhan, 430074, China
Yunliang Chen

Authors

Chengyao Lv
View author publications
You can also search for this author in PubMed Google Scholar
Huihua Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuanxing Dong
View author publications
You can also search for this author in PubMed Google Scholar
Yunliang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yunliang Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lv, C., Liu, H., Dong, Y. et al. Corpus based part-of-speech tagging. Int J Speech Technol 19, 647–654 (2016). https://doi.org/10.1007/s10772-016-9356-2

Download citation

Received: 10 April 2016
Accepted: 21 July 2016
Published: 01 August 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10772-016-9356-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Corpus based part-of-speech tagging

Abstract

Access this article

Similar content being viewed by others

Part-of-Speech Tagging Using Evolutionary Computation

“Part of Speech Tagging – A Corpus Based Approach”

PSO-Tagger: A New Biologically Inspired Approach to the Part-of-Speech Tagging Problem

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Corpus based part-of-speech tagging

Abstract

Access this article

Similar content being viewed by others

Part-of-Speech Tagging Using Evolutionary Computation

“Part of Speech Tagging – A Corpus Based Approach”

PSO-Tagger: A New Biologically Inspired Approach to the Part-of-Speech Tagging Problem

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation