Abstract
In text classification, one key problem is its inherent dichotomy of polysemy and synonym; the other problem is the insufficient usage of abundant useful, but unlabeled text documents. Targeting on solving these problems, we incorporate a sprinkling Latent Semantic Indexing (LSI) with background knowledge for text classification. The motivation comes from: 1) LSI is a popular technique for information retrieval and it also succeeds in text classification solving the problem of polysemy and synonym; 2) By fusing the sprinkling terms and unlabeled terms, our method not only considers the class relationship, but also explores the unlabeled information. Finally, experimental results on text documents demonstrate our proposed method benefits for improving the classification performance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chakraborti, S., Lothian, R., Wiratunga, N., Watt, S.: Sprinkling: Supervised latent semantic indexing. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 510–514. Springer, Heidelberg (2006)
Chakraborti, S., Mukras, R., Lothian, R., Wiratunga, N., Watt, S., Harper, D.: Supervised latent semantic indexing using adaptive sprinkling. In: Proceedings of IJCAI 2007, pp. 1582–1587 (2007)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, Chichester (2000)
Dumais, S.T.: Using lsi for information filtering: Trec-3 experiments (1995)
Gee, K.R.: Using latent semantic indexing to filter spam. In: Proceedings of the 2003 ACM symposium on Applied computing, pp. 460–464 (2003)
Lang, K.: NewsWeeder: learning to filter netnews. In: Proceedings of ICML 1995, pp. 331–339. Morgan Kaufmann publishers Inc., San Mateo (1995)
Liu, T., Chen, Z., Zhang, B., Ma, W.y., Wu, G.: Improving text classification using local latent semantic indexing. In: Proceedings of ICDM 2004, pp. 162–169 (2004)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Sun, J.-T., Chen, Z., Zeng, H.-J., Lu, Y.-C., Shi, C.-Y., Ma, W.-Y.: Supervised latent semantic indexing for document categorization. In: Perner, P. (ed.) ICDM 2004. LNCS, vol. 3275, pp. 535–538. Springer, Heidelberg (2004)
Zelikovitz, S., Hirsh, H.: Using LSI for text classification in the presence of background text. In: Paques, H., Liu, L., Grossman, D. (eds.) Proceedings of CIKM 2001, pp. 113–118 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, H., King, I. (2009). Sprinkled Latent Semantic Indexing for Text Classification with Background Knowledge. In: Köppen, M., Kasabov, N., Coghill, G. (eds) Advances in Neuro-Information Processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5507. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03040-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-03040-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03039-0
Online ISBN: 978-3-642-03040-6
eBook Packages: Computer ScienceComputer Science (R0)