Skip to main content

Mutual Information Independence Model Using Kernel Density Estimation for Segmenting and Labeling Sequential Data

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

  • 2252 Accesses

Abstract

This paper proposes a Mutual Information Independence Model (MIIM) to segment and label sequential data. MIIM overcomes the strong context independent assumption in traditional generative HMMs by assuming a novel pairwise mutual information independence. As a result, MIIM separately models the long state dependence in its state transition model in a generative way and the observation dependence in its output model in a discriminative way. In addition, a variable-length pairwise mutual information-based modeling approach and a kNN algorithm using kernel density estimation are proposed to capture the long state dependence and the observation dependence respectively. The evaluation on shallow parsing shows that MIIM can effectively capture the long context dependence to segment and label sequential data. It is interesting to note that using kernel density estimation leads to increased performance over using a classifier-based approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Church, K.W.: A Stochastic Pars Program and Noun Phrase Parser for Unrestricted Text. In: Proceedings of the Second Conference on Applied Natural Language Processing (ANLP 1998), Austin, Texas (1998)

    Google Scholar 

  2. Weischedel, R., Meteer, M., Schwartz, R., Ramshaw, L., Palmucci, J.: Coping with Ambiguity and Unknown Words through Probabilistic Methods. Computational Linguistics 19(2), 359–382 (1993)

    Google Scholar 

  3. Merialdo, B.: Tagging English Text with a Probabilistic Model. Computational Linguistics 20(2), 155–171 (1994)

    Google Scholar 

  4. Bikel, D.M., Schwartz, R., Weischedel, R.M.: An Algorithm that Learns What’s in a Name. Machine Learning (Special Issue on NLP) 34(3), 211–231 (1999)

    MATH  Google Scholar 

  5. Zhou, G.D., Su, J.: Named Entity Recognition Using a HMM-based Chunk Tagger. In: Proceedings of the Conference on Annual Meeting for Computational Linguistics (ACL 2002), Philadelphia, pp. 473–480 (2002)

    Google Scholar 

  6. Segond, F., Schiller, A., Grefenstette, Chanod, F.P.: An Experiment in Semantic Tagging using Hidden Markov Model Tagging. In: Proceedings of the Joint ACL/EACL workshop on Automatic Information Extraction and Building of Lexical Semantic Resources, Madrid, Spain, pp. 78–81 (1997)

    Google Scholar 

  7. Brants, T., Skut, W., Krenn, B.: Tagging Grammatical Functions. In: Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP 1997), Brown Univ. (1997)

    Google Scholar 

  8. Skut, W., Brants, T.: Chunk Tagger – Statistical Recognition of Noun Phrases. In: Proceedings of the ESSLLI 1998 workshop on Automatic Acquisition of Syntax and Parsing, Univ. of Saarbrucken, Germany (1998)

    Google Scholar 

  9. Zhou, G.D., Su, J.: Error-driven HMM-based Chunk Tagger with Context-Dependent Lexicon. In: Proceedings of the Joint Conference on Empirical Methods on Natural Language Processing and Very Large Corpus (EMNLP/VLC 2000), Hong Kong (2000)

    Google Scholar 

  10. Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  11. Gale, W.A., Sampson, G.: Good-Turing frequency estimation without tears. Journal of Quantitative Linguistics 2, 217–237 (1995)

    Article  Google Scholar 

  12. Jelinek, F.: Self-Organized Language Modeling for Speech Recognition. In: Waibel, A., Lee, K.-F. (eds.) Readings in Speech Recognition, pp. 450–506. Morgan Kaufmann, San Francisco (1989)

    Google Scholar 

  13. Katz, S.M.: Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Transactions on Acoustics. Speech and Signal Processing 35, 400–401 (1987)

    Article  Google Scholar 

  14. Ratnaparkhi, A.: Learning to parsing natural language with maximum entropy models. Machine Learning 34, 151–175 (1999)

    Article  MATH  Google Scholar 

  15. Daelemans, W., Buchholz, S., Veenstra: Memory-based shallow parsing. In: Proceedings of CoNLL 1999, Bergen, Norway (1999)

    Google Scholar 

  16. Daelemans, T.K.S., Dejean, H., et al.: Applying system combination to base noun noun phrase indentification. In: Proceedings of COLING 2000, Saarbrucken, Germany (2000)

    Google Scholar 

  17. Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Buliding a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)

    Google Scholar 

  18. van Rijsbergen, C.J.: Information Retrieval. Buttersworth, London (1979)

    Google Scholar 

  19. Roth, D.: Learning to resolve natural language ambiguities: A unified approach. In: Proceedings of the National Conference on Artificial Intelligence, pp. 806–813 (1998)

    Google Scholar 

  20. Carlson, A., Cumby, C., Rosen, J., Roth, D.: The SNoW learning architecture. Techinical Report UIUCDCS-R-99-2101. UIUC (1999)

    Google Scholar 

  21. Kudoh, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of NAACL 2001, Pittsburgh, PA, USA (2001)

    Google Scholar 

  22. Punyakanok, V., Roth, D.: The Use of Classifiers in Sequential Inference NIPS-13 (2000)

    Google Scholar 

  23. Zhang, T., Damerau, F., Johnson, D.: Text chunking based on a generalization of winnow. Journal of Machine Learning Research 2, 615–637 (2002)

    Article  MATH  Google Scholar 

  24. McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: ICML-19, Stanford, California, pp. 591–598 (2000)

    Google Scholar 

  25. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML-20 (2001)

    Google Scholar 

  26. Bottou, L.: Une approche theorique de l’apprentissage connexionniste: Applications a la reconnaissance de la parole. Doctoral dissertation, Universite de Paris XI (1991)

    Google Scholar 

  27. McCallum, A., Rohanimanesh, K., Sutton, C.: Dynamic conditional random fields for jointly labeling multiple sequences. In: Proceedings of IJCAI 2003 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhou, G., Yang, L., Su, J., Ji, D. (2005). Mutual Information Independence Model Using Kernel Density Estimation for Segmenting and Labeling Sequential Data. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30586-6_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24523-0

  • Online ISBN: 978-3-540-30586-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics