Skip to main content

Discovering Text Patterns by a New Graphic Model

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6871))

Abstract

We propose a probabilistic graphical model that works for recognizing three types of text patterns in a sentence: noun phrases; the meaning of an ambiguous word; and semantic arguments of a verb. The model has an unique mathematical expression and graphical representation compared with existing graphic models such as CRFs, HMMs, and MEMMs. In our model, a sequence of optimal categories for a sequence of symbols is determined by finding the optimal category for each symbol independently. Two consequences follows. First, it does not need to employ dynamic programming. The on-line time complexity and memory complexity are reduced. Moreover, the ratio of misclassification will be decreased. Experiments conducted on standard data sets show good results. For instance, our method achieves an average precision of 97.7% and an average recall of 98.8% for recognizing noun phrases on WSJ data from Penn Treebank; an average accuracy of 81.12% for recognizing the six sense word ′line′; an average precision of 92.96% and an average of recall of 94.94% for classifying semantic argument boundaries of a verb of a sentence on WSJ data from Penn Treebank and PropBank. The performance of each task surpasses or approaches the state-of-art level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Molina, A., Pla, F., Tics, D.D.S.I., Hammerton, J., Osborne, M., Armstrong, S., Daelemans, W.: Shallow parsing using specialized hmms. Journal of Machine Learning Research 2, 595–613 (2002)

    Google Scholar 

  2. MaCallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of 17th International Conf. on Machine Learning, pp. 591–598 (2000)

    Google Scholar 

  3. Lafferty, J., MaCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th International Conf. on Machine Learning, pp. 282–289 (2001)

    Google Scholar 

  4. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19(2), 313–330 (1994)

    Google Scholar 

  5. Weischedel, R., Palmer, M., Marcus, M., Hovy, E.: Ontonotes release 2.0 with ontonotes db tool v. 0.92 beta and ontoviewer v.0.9 beta (2007), http://www.bbn.com/NLP/OntoNotes

  6. Tjong, E.F., Sang, K.: Introduction to the conll-2000 shared task: Chunking. In: Proceedings of CoNLL 2000, pp. 127–132 (2000)

    Google Scholar 

  7. Leacock, C., Towell, G., Voorhees, E.: Corpus based statistical sense resolution. In: Proceedings of the Workshop on Human Language Technology, pp. 260–265 (1993)

    Google Scholar 

  8. Bruce, R., Wiebe, J.: Word-sense disambiguation using decomposable models. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp. 139–146 (1994)

    Google Scholar 

  9. Veenstra, J., den Bosch, J.A.V.: Single-classifier memory-based phrase chunking. In: Preceedings of CoNLL-2000 and LLL-2000, pp. 157–159 (2000)

    Google Scholar 

  10. Sha, F., Fereira, F.: Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL, pp. 213–220 (2003)

    Google Scholar 

  11. Carreras, X., Mrquez, L.: Phrase recognition by filtering and ranking with perceptrons. In: The International Conference on Recent Advances on Natural Language Processing (2003)

    Google Scholar 

  12. Wu-Chieh, W., Lee, Y.S., Yang, J.C.: Robust and efficient multiclass svm models for phrase pattern recognition. Pattern Recognition 41, 2874–2889 (2008)

    Article  MATH  Google Scholar 

  13. Huang, M., Haralick, R.M.: Recognizing Patterns in Texts, Pattern Recognition and Machine Vision, pp. 19–35. River Publishers Series in Information Science and Technology (2010)

    Google Scholar 

  14. Levin, E., Sharifi, M., Ball, J.: Evaluation of utility of lsa for word sense discrimination. In: Preceedings of HLT-NAACL, pp. 77–80 (2006)

    Google Scholar 

  15. Church, K.W.: A stochastic parts program and noun phrase parser for unrestricted text. In: Proceedings of the Second Conference on Applied Natural Language Processing, pp. 136–143 (1988)

    Google Scholar 

  16. Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 82–94 (1995)

    Google Scholar 

  17. Abney, S., Abney, S.P.: Parsing by chunks. In: Principle-Based Parsing, pp. 257–278. Kluwer Academic Publishers, Dordrecht (1991)

    Chapter  Google Scholar 

  18. Hearst, M.A.: Noun homograph disambiguation using local context in large text corpora. In: Proceedings of the Seventh Annual Conference of the UW centre for the New OED and Text Research, pp. 1–22 (1991)

    Google Scholar 

  19. Gale, W., Church, K., Yarowsky, D.: A method for disambiguating word senses in a large corpus. In: Computers and the Humanities, pp. 415–439 (1992)

    Google Scholar 

  20. Leacock, C., Miller, G.A., Chodorow, M.: Using corpus statistics and wordnet relations for sense identification. Computational Linguist 24, 147–165 (1998)

    Google Scholar 

  21. Yarowsky, D.: Decision lists for lexical ambiguity resolution: Application to accent restoration in spanish and frech. In: Preceedings of the 32nd Annual Meeting (1994)

    Google Scholar 

  22. Gildea, D., Jurafsky, D.: Automatic labelling of semantic roles. Computational Linguistics, 245–288 (2002)

    Google Scholar 

  23. Baldewein, U., Erk, K., Pad, S., Prescher, D.: Semantic role labeling with chunk sequences. In: Proceedings of CoNLL-2004 Shared Task (2004)

    Google Scholar 

  24. Cohn, T., Blunsom, P.: Semantic role labelling with tree conditional random fields. In: Proceedings of CoNLL-2005 Shared Task (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang, M., Haralick, R.M. (2011). Discovering Text Patterns by a New Graphic Model. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2011. Lecture Notes in Computer Science(), vol 6871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23199-5_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23199-5_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23198-8

  • Online ISBN: 978-3-642-23199-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics