Topic Sequence Kernel

Xu, Jian; Lu, Qin; Liu, Zhengzhong; Chai, Junyi

doi:10.1007/978-3-642-35341-3_41

Jian Xu²¹,
Qin Lu²¹,
Zhengzhong Liu²¹ &
…
Junyi Chai²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7675))

Included in the following conference series:

Asia Information Retrieval Symposium

1236 Accesses

Abstract

This paper addresses the problem of classifying documents using the kernel approaches based on topic sequences. Previously, the string kernel uses the ordered subsequence of characters as features and the word sequence kernel is proposed to use words as the subsequences. However, they both face the problem of computational complexity because of the large amount of symbols (characters or words). This paper, therefore, proposes to use sequences of topics rather than characters or words to reduce the number of symbols, thus increasing the computational efficiency. Documents that exhibit similar posterior topic proportions are expected to have similar topic sequence and then should be classified into the same category. Experiments conducted on the Reuters-21578 datasets have proven this hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unsupervised Document Classification and Topic Detection

Classification of Text Documents Based on a Probabilistic Topic Model

Article 01 December 2019

When are Latent Topics Useful for Text Mining?

References

Joachims, T.: Text Categorization with Support Vector Machines. Technical report, LS VIII NO. 23. University of Dortmund (1997)
Google Scholar
Osuna, E., Freund, R., Girosi, F.: Training support vector machines: an application to face detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 130–136 (1997)
Google Scholar
Wang, J.Y.: Application of Support Vector Machines in Bioinformatics. Master’s thesis, Dept. Computer Sci. Info. Eng., National Taiwan University (2002)
Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. The Journal of Machine Learning Research 2, 419–444 (2002)
MATH Google Scholar
Cancedda, N., Gaussier, E., Goutte, C., Renders, J.M.: Word sequence kernels. Journal of Machine Learning Research 3, 1059–1082 (2003)
MathSciNet MATH Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (2004)
Google Scholar
Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society London (A) 209, 415–446 (1909)
MATH Google Scholar
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Research and Development in Information Retrieval, pp. 50–57 (1999)
Google Scholar
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Blei, D., Lafferty, J.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120 (2006)
Google Scholar
Blei, D., Lafferty, J.: A correlated topic model of science. Annals of Applied Statistics 1(1), 17–35 (2007)
Article MathSciNet MATH Google Scholar
Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: Uncertainty in Artificial Intelligence, UAI (2002)
Google Scholar
Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Landauer, T., McNamara, D., Dennis, S., Kintsch, W. (eds.) Latent Semantic Analysis: A Road to Meaning. Lawrence Erlbaum (2006)
Google Scholar
Teh, Y., Newman, D., Welling, M.: A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In: Neural Information Processing Systems (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, The Hong Kong Polytechnic University, Hong Kong
Jian Xu, Qin Lu, Zhengzhong Liu & Junyi Chai

Authors

Jian Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qin Lu
View author publications
You can also search for this author in PubMed Google Scholar
Zhengzhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Junyi Chai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of computer Science and Technology, Tianjin University, Tianjin, 300072, China
Yuexian Hou
DIRO, University of Montreal, CP. 6128, succursale Centre-ville, H3C 3J7, Montreal, QC, Canada
Jian-Yun Nie
Institute of Software, Storage & Information Retrieval Laboratory, Chinese Academy of Sciences, 100190, Beijing, China
Le Sun
School of Computer Science and Technology, Tianjin University, 300072, Tianjin, China
Bo Wang
School of Computing, Robert Gordon University, St Andrew Street, AB25 1HG, Aberdeen, UK
Peng Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, J., Lu, Q., Liu, Z., Chai, J. (2012). Topic Sequence Kernel. In: Hou, Y., Nie, JY., Sun, L., Wang, B., Zhang, P. (eds) Information Retrieval Technology. AIRS 2012. Lecture Notes in Computer Science, vol 7675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35341-3_41

Download citation

DOI: https://doi.org/10.1007/978-3-642-35341-3_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35340-6
Online ISBN: 978-3-642-35341-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Topic Sequence Kernel

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Unsupervised Document Classification and Topic Detection

Classification of Text Documents Based on a Probabilistic Topic Model

When are Latent Topics Useful for Text Mining?

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Topic Sequence Kernel

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Unsupervised Document Classification and Topic Detection

Classification of Text Documents Based on a Probabilistic Topic Model

When are Latent Topics Useful for Text Mining?

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation