Kernels for Text Analysis

Tsivtsivadze, Evgeni; Pahikkala, Tapio; Boberg, Jorma; Salakoski, Tapio

doi:10.1007/978-3-540-78297-1_4

Evgeni Tsivtsivadze⁵,
Tapio Pahikkala⁵,
Jorma Boberg⁵ &
…
Tapio Salakoski⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 116))

928 Accesses
1 Citations

Summary

During past decade, kernel methods have proved to be successful in different text analysis tasks. There are several reasons that make kernel based methods applicable to many real world problems especially in domains where data is not naturally represented in a vector form. Firstly, instead of manual construction of the feature space for the learning task, kernel functions provide an alternative way to design useful features automatically, therefore, allowing very rich representations. Secondly, kernels can be designed to incorporate a. prior knowledge about the domain. This property allows to notably improve performance of the general learning methods and their simple adaptation to the specific problem. Finally, kernel methods are naturally applicable in situations where data representation is not in a vectorial form, thus avoiding extensive preprocessing step. In this chapter, we present the main ideas behind kernel methods in general and kernels for text analysis in particular as well as provide an example of designing feature space for parse ranking problem with different kernel functions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aronszajn, N.: Theory of reproducing kernels. Transactions of the American Mathematical Society 68 (1950)
Google Scholar
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT, Cambridge, MA (2001)
Google Scholar
Herbrich, R.: Learning Kernel Classifiers: Theory and Algorithms. MIT, Cambridge, MA (2002)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York, NY (2004)
Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: European Conference on Machine Learning (ECML), Berlin, Springer (1998) 137–142
Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.J.C.H.: Text classification using string kernels. J. Mach. Learn. Res. 2 (2002) 419–444
Article MATH Google Scholar
Cancedda, N., Gaussier, E., Goutte, C., Renders, J.M.: Word sequence kernels. J. Mach. Learn. Res. 3 (2003) 1059–1082
Article MATH MathSciNet Google Scholar
Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, UC Santa Cruz (1999)
Google Scholar
Collins, M., Duffy, N.: Convolution kernels for natural language. In Dietterich, T.G., Becker, S., Ghahramani, Z., eds.: NIPS, MIT, Cambridge, MA (2001) 625–632
Google Scholar
Gärtner, T., Flach, P.A., Wrobel, S.: On graph kernels: Hardness results and efficient alternatives. In Schölkopf, B., Warmuth, M.K., eds.: Sixteenth Annual Conference on Computational Learning Theory and Seventh Kernel Workshop (COLT-2003). Volume 2777 of Lecture Notes in Computer Science., Springer (2003) 129–143
Google Scholar
Pahikkala, T., Tsivtsivadze, E., Boberg, J., Salakoski, T.: Graph kernels versus graph representations: a case study in parse ranking. In Gärtner, T., Garriga, G.C., Meinl, T., eds.: Proceedings of the ECML/PKDD’06 workshop on Mining and Learning with Graphs (MLG’06). (2006)
Google Scholar
Cristianini, N., Shawe-Taylor, J., Lodhi, H.: Latent semantic kernels. J. Intell. Inf. Syst. 18 (2002) 127–152
Article Google Scholar
Leslie, C., Kuang, R.: Fast string kernels using inexact matching for protein sequences. J. Mach. Learn. Res. 5 (2004) 1435–1455
MathSciNet Google Scholar
Sleator, D.D., Temperley, D.: Parsing english with a link grammar. Technical Report CMU-CS-91-196, Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA (1991)
Google Scholar
Tsivtsivadze, E., Pahikkala, T., Boberg, J., Salakoski, T.: Locality-convolution kernel and its application to dependency parse ranking. In Ali, M., Dapoigny, R., eds.: IEA/AIE. Volume 4031 of Lecture Notes in Computer Science., Springer (2006) 610–618
Google Scholar
Gärtner, T.: Exponential and geometric kernels for graphs. In: NIPS Workshop on Unreal Data: Principles of Modeling Nonvectorial Data. (2002)
Google Scholar
Tsivtsivadze, E., Pahikkala, T., Pyysalo, S., Boberg, J., Mylläri, A., Salakoski, T.: Regularized least-squares for parse ranking. In: Proceedings of the 6th International Symposium on Intelligent Data Analysis, Springer-Verlag (2005) 464–474 Copyright Springer-Verlag Berlin Heidelberg 2005
Google Scholar
Lafferty, J., Sleator, D., Temperley, D.: Grammatical trigrams: A probabilistic model of link grammar. In: Proceedings of the AAAI Conference on Probabilistic Approaches to Natural Language, Menlo Park, CA, AAAI Press (1992) 89–97
Google Scholar
Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., Salakoski, T.: BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics (2007) Available at http://www.it.utu.fi/BioInfer.
Kendall, M.G.: Rank Correlation Methods. 4 edn. Griffin, London (1970)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Turku Centre for Computer Science (TUGS), Department of Information Technology, University of Turku, Turku, Finland
Evgeni Tsivtsivadze, Tapio Pahikkala, Jorma Boberg & Tapio Salakoski

Authors

Evgeni Tsivtsivadze
View author publications
You can also search for this author in PubMed Google Scholar
Tapio Pahikkala
View author publications
You can also search for this author in PubMed Google Scholar
Jorma Boberg
View author publications
You can also search for this author in PubMed Google Scholar
Tapio Salakoski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China
Ying Liu
School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, Singapore, 639798
Aixin Sun & Ee-Peng Lim &
Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117576
Han Tong Loh & Wen Feng Lu &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tsivtsivadze, E., Pahikkala, T., Boberg, J., Salakoski, T. (2008). Kernels for Text Analysis. In: Liu, Y., Sun, A., Loh, H.T., Lu, W.F., Lim, EP. (eds) Advances of Computational Intelligence in Industrial Systems. Studies in Computational Intelligence, vol 116. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78297-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-78297-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78296-4
Online ISBN: 978-3-540-78297-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics