article

The study of a nonstationary maximum entropy Markov model and its application on the pos-tagging task

Authors:

Bingquan LiuAuthors Info & Claims

ACM Transactions on Asian Language Information Processing (TALIP), Volume 6, Issue 2

Pages 7 - es

https://doi.org/10.1145/1282080.1282082

Published: 01 September 2007 Publication History

Abstract

Sequence labeling is a core task in natural language processing. The maximum entropy Markov model (MEMM) is a powerful tool in performing this task. This article enhances the traditional MEMM by exploiting the positional information of language elements. The stationary hypothesis is relaxed in MEMM, and the nonstationary MEMM (NS-MEMM) is proposed. Several related issues are discussed in detail, including the representation of positional information, NS-MEMM implementation, smoothing techniques, and the space complexity issue. Furthermore, the asymmetric NS-MEMM presents a more flexible way to exploit positional information. In the experiments, NS-MEMM is evaluated on both the Chinese and the English pos-tagging tasks. According to the experimental results, NS-MEMM yields effective improvements over MEMM by exploiting positional information. The smoothing techniques in this article effectively solve the NS-MEMM data-sparseness problem; the asymmetric NS-MEMM is also an improvement by exploiting positional information in a more flexible way.

References

[1]

Bazzi, I., Schwartz, R. M., and Makhoul, J. 1999. An omnifont open-vocabulary OCR system for English and Arabic, IEEE Tron. Pattern Analysis and Machine Intelligence 21,6, 495--504.

Digital Library

[2]

Brown, P. F., Pietra, V. J. D., Desouza, P. V., Lai, J. C., and Mercer, R. L. 1992. Class-based n-gram models of natural language. Computational Linguistics 18, 4, 467--479.

Digital Library

[3]

Chen, S. F. and Rosenfeld, R. 2000. A survey of smoothing techniques for ME models. IEEE Trans. Speech and Audio Processings, 1, 37--50.

[4]

Collins, M., Roark, B., and Saraclar, M. 2005. Discriminative syntactic language modeling for speech recognition, In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005, Ann Arbor, MI).

Digital Library

[5]

Cover, T. M. and Thomas, J. A. 1991. Elements of Information Theory. Wiley, New York.

Digital Library

[6]

Darroch, J. and Ratcliff, D. 1972. Generalized iterative scaling for log-linear models. Ann. Math. Statistics 43, 1470--1480.

[7]

Dluric, P. M. and Chun, J. H. 2002. An MCMC sampling approach to estimation of nonstationary hidden Markov models. IEEE Trans. Signal Processing 50, 5, 1113--1124.

Digital Library

[8]

Dong, Q. W., Wang, X. L., Lin, L., Guan, Y., and Zhao, J. 2005. A seqlet-based maximum entropy Markov approach for protein secondary structure prediction. China Ser. C Life Sciences 48, 4, 394--405.

[9]

Ferguson, J. D. 1980. Variable duration models for speech. In Proceedings of the Symposium on the Application of Hidden Markov Models to Text and Speech (Princeton, NJ), 143--179.

[10]

Gao, J. F., Yu, H., and Yuan, W. 2005. Minimum sample risk methods for language modeling. In Proceedings of the Human Language Technology Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005, Vancouver, B.C., Oct 6-8).

Digital Library

[11]

Good, I. J. 1953. The population frequencies of species and the estimation of population parameters. Biometrika 40, 16, 237--264.

[12]

Lafferty, J., Pereira, F., and Mccallum, A. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data, In Proceedings of the 8th International Conference on Machine Learning (ICML 2001), 282--289.

Digital Library

[13]

Li, Y. J. 2005. Hidden Markov models with states depending on observations. Pattern Recognition Lett. 26, 7, 977--984.

[14]

Manning, C. D. and Schutze, H. 1999. Foundation of Statistic Natural Language Processing. MIT Press, Cambridge, MA.

Digital Library

[15]

Martin, S., Ney, H., and Zaplo, J. 1999. Smoothing Methods in Maximum Entropy Language Modeling, In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 1999), 1:545--548. Phoenix, AR.

Digital Library

[16]

Mccallum, A., Freitag, D., and Pereira, F. C. N. 2000. Maximum Entropy Markov Models for Information Extraction and Segmentation, In Proceedings of the 7th International Conference on Machine Learning (ICML 2000): 591--598.

Digital Library

[17]

Myung, I. J. 2003. Tutorial on maximum likelihood estimation. J. Mathematical Psychology 47, 90--100.

Digital Library

[18]

Ng, H. T. and Low, J. K. 2004. Chinese part-of-speech tagging: One-at-a-time or all-at-once&quest; Word-based or character-based&quest;, In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004, Barcelona).

[19]

Pietra, S. D., Pietra, V. D., and Lafferty, J. 1997. Inducing features of random fields. IEEE Trans. Pattern Analysis and Machine Intelligence 19, 380--393.

Digital Library

[20]

Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257--286.

[21]

Rosenfeld, R. 1996. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language 10, 187--228.

[22]

Sametii, H. and Deng, L. 2002. Nonstationary-state hidden Markov Model representation of speech signals for speech enhancement. IEEE Trans. Signal Processing 82, 2, 205--227.

Digital Library

[23]

Sin, B. and Kim, J. H. 1995. Nonstationary hidden Markov model. IEEE Trans. Signal Processing 46, 1, 31--46.

Digital Library

[24]

Sun, G. L., Huang, C. N., Wang, X. L., and Xu, Z. M. 2006. Chinese chunking based on maximum entropy Markov models. Int. J. Computational Linguistics and Chinese Language Processing 11,2, 115--136.

[25]

Wang, Z. Y. and Xiao, X. 2004. Duration distribution based HMM speech recognition models. ACTA Electronica Sinica 32, 1, 46--49.

[26]

Xiao, J. H., Liuu, B. Q., and D Wang, X. L. 2005. Principles of a non-stationary hidden Markov model and its applications to the sequence labeling task, In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP 2005, Jeju, Korea), Lecture Notes in Artificial Intelligence, Springer Verlag, New York.

Digital Library

Cited By

Fernandes RRio D’Souza G(2018)Lightweight Context-Based Web-Service Composition Model for Mobile DevicesDigital Business10.1007/978-3-319-93940-7_9(199-222)Online publication date: 27-Jul-2018
https://doi.org/10.1007/978-3-319-93940-7_9
Wang YTang LHe T(2018)Attention-Based CNN-BLSTM Networks for Joint Intent Detection and Slot FillingChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data10.1007/978-3-030-01716-3_21(250-261)Online publication date: 19-Oct-2018
https://dl.acm.org/doi/10.1007/978-3-030-01716-3_21
Wang JLiu ZZhao H(2012)A probabilistic model with multi-dimensional features for object extractionFrontiers of Computer Science10.1007/s11704-012-1093-36:5(513-526)Online publication date: 10-Oct-2012
https://doi.org/10.1007/s11704-012-1093-3

Index Terms

The study of a nonstationary maximum entropy Markov model and its application on the pos-tagging task
1. Applied computing
  1. Document management and text processing
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Unsupervised Joint PoS Tagging and Stemming for Agglutinative Languages

The number of possible word forms is theoretically infinite in agglutinative languages. This brings up the out-of-vocabulary (OOV) issue for part-of-speech (PoS) tagging in agglutinative languages. Since inflectional morphology does not change the PoS ...
Improving Data Augmentation for Low-Resource NMT Guided by POS-Tagging and Paraphrase Embedding
Data augmentation is an approach for several text generation tasks. Generally, in the machine translation paradigm, mainly in low-resource language scenarios, many data augmentation methods have been proposed. The most used approaches for generating ...
A Comparative Study on the Efficiency of POS Tagging Techniques on Amazigh Corpus
NISS '19: Proceedings of the 2nd International Conference on Networking, Information Systems & Security

Part-of-speech (POS) tagging is a fundamental task of Natural Language Processing (NLP). It provides useful information for many other NLP tasks, including word sense disambiguation, text chunking, named entity recognition, syntactic parsing, semantic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian Language Information Processing

ACM Transactions on Asian Language Information Processing Volume 6, Issue 2

September 2007

84 pages

ISSN:1530-0226

EISSN:1558-3430

DOI:10.1145/1282080

Issue’s Table of Contents

Copyright © 2007 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2007

Published in TALIP Volume 6, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
483
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 07 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fernandes RRio D’Souza G(2018)Lightweight Context-Based Web-Service Composition Model for Mobile DevicesDigital Business10.1007/978-3-319-93940-7_9(199-222)Online publication date: 27-Jul-2018
https://doi.org/10.1007/978-3-319-93940-7_9
Wang YTang LHe T(2018)Attention-Based CNN-BLSTM Networks for Joint Intent Detection and Slot FillingChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data10.1007/978-3-030-01716-3_21(250-261)Online publication date: 19-Oct-2018
https://dl.acm.org/doi/10.1007/978-3-030-01716-3_21
Wang JLiu ZZhao H(2012)A probabilistic model with multi-dimensional features for object extractionFrontiers of Computer Science10.1007/s11704-012-1093-36:5(513-526)Online publication date: 10-Oct-2012
https://doi.org/10.1007/s11704-012-1093-3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents