Use of Linguistic Features in Context-Sensitive Text Classification

Wong, Alex K. S.; Lee, John W. T.; Yeung, Daniel S.

doi:10.1007/11739685_73

Alex K. S. Wong²²,
John W. T. Lee²² &
Daniel S. Yeung²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3930))

1086 Accesses
1 Citations

Abstract

Many popular Text Classification (TC) models use simple occurrence of words in a document as features to base their classifications. They commonly assume word occurrences to be statistically independent in their design. Although such assumption does not hold in general, these TC models are robust and efficient in their task. Some recent studies have shown context-sensitive TC approaches were able to perform better in general. On the other hand, although complex linguistic or semantic features may intuitively be more relevant in TC, studies on their effectiveness have produced mixed and inconclusive results. In this paper, we present our investigation on the use of some complex linguistic features with two context-sensitive TC methods. Our experimental results show potential advantages of such approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bloehdorn, S., Hotho, A.: Boosting for Text Classification with Semantic Features. In: Proceedings of the MSW workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 70–87 (2004)
Google Scholar
Cohen, W.W.: Fast Effective Rule Induction. In: Proceedings of the 12th International Conference on Machine Learning, Lake Tahoe, CA (1995)
Google Scholar
Cohen, W.W., Singer, Y.: Context-sensitive Learning Methods for Text Categorization. ACM Transactions on Information Systems 13(1), 100–111 (1999)
Google Scholar
Furnkranz, J., Widmer, G.: Incremental Reduced Error Pruning. In: Proceedings of the 11th Annual Conference on Machine Learning, New Brunswick, NJ. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Google Scholar
JWNL, http://jwordnet.sourceforge.net/
Miller, G.A.: WordNet: An On-line Lexical Database. International Journal of Lexicography 3(4) (1990)
Google Scholar
Miller, G.A., Chodorow, M., Landes, S., Leacock, C., Thomas, R.: Using a Semantic Concordance for Sense Identification. In: Proceedings of the Human Language Technology Workshop (1994)
Google Scholar
Moschitti, A., Basili, R.: Complex Linguistic Features for Text Classification: A Comprehensive Study. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 181–196. Springer, Heidelberg (2004)
Chapter Google Scholar
Rocchio, J.: Relevance Feedback Information Retrieval. In: Salton, G. (ed.) The Smart Retrieval System – Experiments in Automatic Document Processing, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)
Google Scholar
Sanderson, M.: Word Sense Disambiguation and Information Retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 142–151 (1994)
Google Scholar
Scott, S., Matwin, S.: Feature Engineering for Text Classification. In: Proceedings of ICML, pp. 379–388 (1999)
Google Scholar
SENSEVAL, http://www.itri.brighton.ac.uk/events/senseval/
Stanford Parser, http://nlp.stanford.edu/downloads/lex-parser.shtml

Download references

Author information

Authors and Affiliations

Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Alex K. S. Wong, John W. T. Lee & Daniel S. Yeung

Authors

Alex K. S. Wong
View author publications
You can also search for this author in PubMed Google Scholar
John W. T. Lee
View author publications
You can also search for this author in PubMed Google Scholar
Daniel S. Yeung
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing, Hong Kong Polytechnic University, P.O. Box, Hong Kong, China
Daniel S. Yeung
School of Creative Media, City University of Hong Kong,, China
Zhi-Qiang Liu
Department of Mathematics and Computer Science, Hebei University, 071002, Baoding, Hebei, P.R. China
Xi-Zhao Wang
School of Electrical and Information Engineering, University of Sydney, 2006, NSW, Australia
Hong Yan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wong, A.K.S., Lee, J.W.T., Yeung, D.S. (2006). Use of Linguistic Features in Context-Sensitive Text Classification. In: Yeung, D.S., Liu, ZQ., Wang, XZ., Yan, H. (eds) Advances in Machine Learning and Cybernetics. Lecture Notes in Computer Science(), vol 3930. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11739685_73

Download citation

DOI: https://doi.org/10.1007/11739685_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33584-9
Online ISBN: 978-3-540-33585-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics