Abstract
Improving accuracy in Information Retrieval tasks via semantic information is a complex problem characterized by three main aspects: the document representation model, the similarity estimation metric and the inductive algorithm. In this paper an original kernel function sensitive to external semantic knowledge is defined as a document similarity model. This semantic kernel was tested over a text categorization task, under critical learning conditions (i.e. poor training data). The results of cross-validation experiments suggest that the proposed kernel function can be used as a general model of document similarity for IR tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: On feature distributional clustering for text categorization. In: Proceedings of SIGIR 2001, New Orleans, Louisiana, United States. ACM Press, New York (2001)
Strzalkowski, T., Carballo, J.P.: Natural language information retrieval: TREC-6 report. In: Text REtrieval Conference (1997)
Voorhees, E.M.: Using wordnet to disambiguate word senses for text retrieval. In: Proceedings of SIGIR 1993, Pittsburgh, PA, USA (1993)
Salton, G.: Automatic text processing: the transformation, analysis and retrieval of information by computer. Addison-Wesley, Reading (1989)
Yang, Y.: Expert network: effective and efficient learning from human decisions in text categorisation and retrieval. In: Proceedings of SIGIR 1994, Dublin, IE (1994)
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning (1999)
Strzalkowski, T., Carballo, J.P., Karlgren, J., Tapanainen, A.H.P., Jarvinen, T.: Natural language information retrieval: TREC-8 report. In: Text REtrieval Conference (1999)
Lewis, D.D.: An evaluation of phrasal and clustered representations on a text categorization task. In: Proceedings of SIGIR 1992, Kobenhavn, DK, pp. 37–50 (1992)
Moschitti, A.: Natural Language Processing and Automated Text Categorization: a study on the reciprocal beneficial interactions. PhD thesis, Computer Science Department, Univ. of Rome Tor Vergata (2003)
Moschitti, A., Basili, R.: Complex linguistic features for text classification: a comprehensive study. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 181–196. Springer, Heidelberg (2004)
Smeaton, A.F.: Using NLP or NLP resources for information retrieval tasks. In: Strzalkowski, T. (ed.) Natural language information retrieval. Kluwer Academic Publishers, Dordrecht (1999)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Sussna, M.: Word sense disambiguation for free-text indexing using a massive semantic network. In: Proceedings of CKIM 1993 (1993)
Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of SIGIR 1994, Dublin, Ireland (1994)
Fernandez-Amoros, D., Gonzalo, J., Verdejo, F.: The role of conceptual relations in word sense disambiguation. In: Proceedings of the 6th international workshop on applications of Natural Language for Information Systems (NLDB 2001) (2001)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Clark, S., Weir, D.: Class-based probability estimation using a semantic hierarchy. Computional Linguistics (2002)
Li, H., Abe, N.: Generalizing case frames using a thesaurus and the mdl principle. Computational Linguistics (1998)
Resnik, P.: Selectional preference and sense disambiguation. In: Proceedings of ACL Siglex Workshop on Tagging Text with Lexical Semantics, Why, What and How?, Washington, April 4-5 (1997)
Agirre, E., Rigau, G.: Word sense disambiguation using conceptual density. In: Proceedings of COLING 1996, Copenhagen, Danmark, pp. 16–22 (1996)
Basili, R., Cammisa, M., Zanzotto, F.M.: A similarity measure for unsupervised semantic disambiguation. In: Proceedings of Language Resources and Evaluation Conference (2004)
Cristianini, N., Shawe-Taylor, J.: An introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Haussler, D.: Convolution kernels on discrete structures. Technical report ucs-crl-99-10, University of California Santa Cruz (1999)
Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval Journal (1999)
Kontostathis, A., Pottenger, W.: Improving retrieval performance with positive and negative equivalence classes of terms (2002)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science (1990)
Scott, S., Matwin, S.: Feature engineering for text classification. In: Bratko, I., Dzeroski, S. (eds.) Proceedings of ICML 1999, San Francisco, US (1999)
Siolas, G., d’Alché Buc, F.: Support vector machines based on a semantic kernel for text categorization. In: Proceedings of IJCNN 2000. IEEE Computer Society, Los Alamitos (2000)
Cristianini, N., Shawe-Taylor, J., Lodhi, H.: Latent semantic kernels. J. Intell. Inf. Syst. (2002)
Kandola, J., Shawe-Taylor, J., Cristianini, N.: Learning semantic similarity. In: NIPS 2002. MIT Press, Cambridge (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Basili, R., Cammisa, M., Moschitti, A. (2005). A Semantic Kernel to Exploit Linguistic Knowledge. In: Bandini, S., Manzoni, S. (eds) AI*IA 2005: Advances in Artificial Intelligence. AI*IA 2005. Lecture Notes in Computer Science(), vol 3673. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558590_30
Download citation
DOI: https://doi.org/10.1007/11558590_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29041-4
Online ISBN: 978-3-540-31733-3
eBook Packages: Computer ScienceComputer Science (R0)