skip to main content
article

The study of a nonstationary maximum entropy Markov model and its application on the pos-tagging task

Published: 01 September 2007 Publication History

Abstract

Sequence labeling is a core task in natural language processing. The maximum entropy Markov model (MEMM) is a powerful tool in performing this task. This article enhances the traditional MEMM by exploiting the positional information of language elements. The stationary hypothesis is relaxed in MEMM, and the nonstationary MEMM (NS-MEMM) is proposed. Several related issues are discussed in detail, including the representation of positional information, NS-MEMM implementation, smoothing techniques, and the space complexity issue. Furthermore, the asymmetric NS-MEMM presents a more flexible way to exploit positional information. In the experiments, NS-MEMM is evaluated on both the Chinese and the English pos-tagging tasks. According to the experimental results, NS-MEMM yields effective improvements over MEMM by exploiting positional information. The smoothing techniques in this article effectively solve the NS-MEMM data-sparseness problem; the asymmetric NS-MEMM is also an improvement by exploiting positional information in a more flexible way.

References

[1]
Bazzi, I., Schwartz, R. M., and Makhoul, J. 1999. An omnifont open-vocabulary OCR system for English and Arabic, IEEE Tron. Pattern Analysis and Machine Intelligence 21,6, 495--504.
[2]
Brown, P. F., Pietra, V. J. D., Desouza, P. V., Lai, J. C., and Mercer, R. L. 1992. Class-based n-gram models of natural language. Computational Linguistics 18, 4, 467--479.
[3]
Chen, S. F. and Rosenfeld, R. 2000. A survey of smoothing techniques for ME models. IEEE Trans. Speech and Audio Processings, 1, 37--50.
[4]
Collins, M., Roark, B., and Saraclar, M. 2005. Discriminative syntactic language modeling for speech recognition, In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005, Ann Arbor, MI).
[5]
Cover, T. M. and Thomas, J. A. 1991. Elements of Information Theory. Wiley, New York.
[6]
Darroch, J. and Ratcliff, D. 1972. Generalized iterative scaling for log-linear models. Ann. Math. Statistics 43, 1470--1480.
[7]
Dluric, P. M. and Chun, J. H. 2002. An MCMC sampling approach to estimation of nonstationary hidden Markov models. IEEE Trans. Signal Processing 50, 5, 1113--1124.
[8]
Dong, Q. W., Wang, X. L., Lin, L., Guan, Y., and Zhao, J. 2005. A seqlet-based maximum entropy Markov approach for protein secondary structure prediction. China Ser. C Life Sciences 48, 4, 394--405.
[9]
Ferguson, J. D. 1980. Variable duration models for speech. In Proceedings of the Symposium on the Application of Hidden Markov Models to Text and Speech (Princeton, NJ), 143--179.
[10]
Gao, J. F., Yu, H., and Yuan, W. 2005. Minimum sample risk methods for language modeling. In Proceedings of the Human Language Technology Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005, Vancouver, B.C., Oct 6-8).
[11]
Good, I. J. 1953. The population frequencies of species and the estimation of population parameters. Biometrika 40, 16, 237--264.
[12]
Lafferty, J., Pereira, F., and Mccallum, A. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data, In Proceedings of the 8th International Conference on Machine Learning (ICML 2001), 282--289.
[13]
Li, Y. J. 2005. Hidden Markov models with states depending on observations. Pattern Recognition Lett. 26, 7, 977--984.
[14]
Manning, C. D. and Schutze, H. 1999. Foundation of Statistic Natural Language Processing. MIT Press, Cambridge, MA.
[15]
Martin, S., Ney, H., and Zaplo, J. 1999. Smoothing Methods in Maximum Entropy Language Modeling, In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 1999), 1:545--548. Phoenix, AR.
[16]
Mccallum, A., Freitag, D., and Pereira, F. C. N. 2000. Maximum Entropy Markov Models for Information Extraction and Segmentation, In Proceedings of the 7th International Conference on Machine Learning (ICML 2000): 591--598.
[17]
Myung, I. J. 2003. Tutorial on maximum likelihood estimation. J. Mathematical Psychology 47, 90--100.
[18]
Ng, H. T. and Low, J. K. 2004. Chinese part-of-speech tagging: One-at-a-time or all-at-once? Word-based or character-based?, In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004, Barcelona).
[19]
Pietra, S. D., Pietra, V. D., and Lafferty, J. 1997. Inducing features of random fields. IEEE Trans. Pattern Analysis and Machine Intelligence 19, 380--393.
[20]
Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257--286.
[21]
Rosenfeld, R. 1996. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language 10, 187--228.
[22]
Sametii, H. and Deng, L. 2002. Nonstationary-state hidden Markov Model representation of speech signals for speech enhancement. IEEE Trans. Signal Processing 82, 2, 205--227.
[23]
Sin, B. and Kim, J. H. 1995. Nonstationary hidden Markov model. IEEE Trans. Signal Processing 46, 1, 31--46.
[24]
Sun, G. L., Huang, C. N., Wang, X. L., and Xu, Z. M. 2006. Chinese chunking based on maximum entropy Markov models. Int. J. Computational Linguistics and Chinese Language Processing 11,2, 115--136.
[25]
Wang, Z. Y. and Xiao, X. 2004. Duration distribution based HMM speech recognition models. ACTA Electronica Sinica 32, 1, 46--49.
[26]
Xiao, J. H., Liuu, B. Q., and D Wang, X. L. 2005. Principles of a non-stationary hidden Markov model and its applications to the sequence labeling task, In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP 2005, Jeju, Korea), Lecture Notes in Artificial Intelligence, Springer Verlag, New York.

Cited By

View all
  • (2018)Lightweight Context-Based Web-Service Composition Model for Mobile DevicesDigital Business10.1007/978-3-319-93940-7_9(199-222)Online publication date: 27-Jul-2018
  • (2018)Attention-Based CNN-BLSTM Networks for Joint Intent Detection and Slot FillingChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data10.1007/978-3-030-01716-3_21(250-261)Online publication date: 19-Oct-2018
  • (2012)A probabilistic model with multi-dimensional features for object extractionFrontiers of Computer Science10.1007/s11704-012-1093-36:5(513-526)Online publication date: 10-Oct-2012

Index Terms

  1. The study of a nonstationary maximum entropy Markov model and its application on the pos-tagging task

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Asian Language Information Processing
      ACM Transactions on Asian Language Information Processing  Volume 6, Issue 2
      September 2007
      84 pages
      ISSN:1530-0226
      EISSN:1558-3430
      DOI:10.1145/1282080
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 September 2007
      Published in TALIP Volume 6, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. MEMM
      2. Markov property
      3. Pos-tagging
      4. data sparseness problem
      5. stationary hypothesis

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 07 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Lightweight Context-Based Web-Service Composition Model for Mobile DevicesDigital Business10.1007/978-3-319-93940-7_9(199-222)Online publication date: 27-Jul-2018
      • (2018)Attention-Based CNN-BLSTM Networks for Joint Intent Detection and Slot FillingChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data10.1007/978-3-030-01716-3_21(250-261)Online publication date: 19-Oct-2018
      • (2012)A probabilistic model with multi-dimensional features for object extractionFrontiers of Computer Science10.1007/s11704-012-1093-36:5(513-526)Online publication date: 10-Oct-2012

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media