Developing an Algorithm for Mining Semantics in Texts

Huang, Minhua; Haralick, Robert M.

doi:10.1007/978-3-642-28601-8_21

Minhua Huang¹⁷ &
Robert M. Haralick¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7182))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1329 Accesses

Abstract

This paper discusses an algorithm for identifying semantic arguments of a verb, word senses of a polysemous word, noun phrases in a sentence. The heart of the algorithm is a probabilistic graphical model. In contrast with other existed graphical models, such as Naive Bayes models, CRFs, HMMs, and MEMMs, this model determines a sequence of optimal class assignments among M choices for a sequence of N input symbols without using dynamic programming, running fast–O(MN), and taking less memory space–O(M). Experiments conducted on standard data sets show encourage results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Molina, A., Pla, F., Hammerton, J., Osborne, M., Armstrong, S., Daelemans, W.: Shallow parsing using specialized hmms. Journal of Machine Learning Research 2, 595–613 (2002)
MATH Google Scholar
MaCallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of 17th International Conf. on Machine Learning, pp. 591–598 (2000)
Google Scholar
Lafferty, J., MaCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th International Conf. on Machine Learning, pp. 282–289 (2001)
Google Scholar
Weischedel, R., Palmer, M., Marcus, M., Hovy, E.: Ontonotes release 2.0 with ontonotes db tool v. 0.92 beta and ontoviewer v.0.9 beta (2007), http://www.bbn.com/NLP/OntoNotes
Leacock, C., Towell, G., Voorhees, E.: Corpus based statistical sense resolution. In: Proceedings of the Workshop on Human Language Technology, pp. 260–265 (1993)
Google Scholar
Bruce, R., Wiebe, J.: Word-sense disambiguation using decomposable models. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp. 139–146 (1994)
Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19(2), 313–330 (1994)
Google Scholar
Tjong, E.F., Sang, K.: Introduction to the CoNLL-2000 Shared Task: Chunking. In: Proceedings of CoNLL 2000, pp. 127–132 (2000)
Google Scholar
Levin, E., Sharifi, M., Ball, J.: Evaluation of utility of lsa for word sense discrimination. In: Preceedings of HLT-NAACL, pp. 77–80 (2006)
Google Scholar
Sha, F., Fereira, F.: Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL, pp. 213–220 (2003)
Google Scholar
Carreras, X., Márquez, L.: Phrase recognition by filtering and ranking with perceptrons. In: The International Conference on Recent Advances on Natural Language Processing (2003)
Google Scholar
Wu, W.-C., Lee, Y.S., Yang, J.C.: Robust and efficient multiclass svm models for phrase pattern recognition. Pattern Recognition 41, 2874–2889 (2008)
Article MATH Google Scholar
Veenstra, J., den BoschJ, A.V.: Single-classifier memory-based phrase chunking. In: Preceedings of CoNLL 2000 and LLL 2000, pp. 157–159 (2000)
Google Scholar
Huang, M., Haralick, R.M.: Recognizing Patterns in Texts. River (2010)
Google Scholar
Church, K.W.: A stochastic parts program and noun phrase parser for unrestricted text. In: Proceedings of the Second Conference on Applied Natural Language Processing, pp. 136–143 (1988)
Google Scholar
Ramshaw, L.A., Marcus, M.P.: Text Chunking Using Transformation-Based Learning. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 82–94 (1995)
Google Scholar
Abney, S., Abney, S.P.: Parsing by chunks. In: Principle-Based Parsing, pp. 257–278. Kluwer Academic Publishers (1991)
Google Scholar
Hearst, M.A.: Noun homograph disambiguation using local context in large text corpora. In: Proceedings of the Seventh Annual Conference of the UW centre for the New OED and Text Research, pp. 1–22 (1991)
Google Scholar
Gale, W., Church, K., Yarowsky, D.: A method for disambiguating word senses in a large corpus. In: Computers and the Humanities, pp. 415–439 (1992)
Google Scholar
Leacock, C., Miller, G.A., Chodorow, M.: Using corpus statistics and wordnet relations for sense identification. Computational Linguist. 24, 147–165 (1998)
Google Scholar
Yarowsky, D.: Decision lists for lexical ambiguity resolution: Application to accent restoration in spanish and frech. In: Preceedings of the 32nd Annual Meeting (1994)
Google Scholar
Gildea, D., Jurafsky, D.: Automatic labelling of semantic roles. Computational Linguistics, 245–288 (2002)
Google Scholar
Baldewein, U., Erk, K., Padó, S., Prescher, D.: Semantic role labeling with chunk sequences. In: Proceedings of CoNLL-2004 Shared Task (2004)
Google Scholar
Cohn, T., Blunsom, P.: Semantic role labelling with tree conditional random fields. In: Proceedings of CoNLL 2005 Shared Task (2005)
Google Scholar
Hacioglu, K.: A semantic chunking model based on tagging. In: Proceedings of HLT/NACCL 2004 (2004)
Google Scholar
Hacioglu, K.: Semantic role labeling using dependency trees. In: Proceedings of Coling 2004, Geneva, Switzerland, COLING, August 23-27, pp. 1273–1276 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, The Graduate School and University Center, The City University of New York, New York, NY, USA, 10016
Minhua Huang & Robert M. Haralick

Authors

Minhua Huang
View author publications
You can also search for this author in PubMed Google Scholar
Robert M. Haralick
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, M., Haralick, R.M. (2012). Developing an Algorithm for Mining Semantics in Texts. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-28601-8_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28600-1
Online ISBN: 978-3-642-28601-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics