Skip to main content

Local Latent Semantic Analysis Based on Support Vector Machine for Imbalanced Text Categorization

  • Conference paper
Applied Informatics and Communication (ICAIC 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 226))

Included in the following conference series:

Abstract

Many text categorization tasks involve imbalanced training examples. We tackle this problem by using improved local Latent Semantic Analysis. LSA has been shown to be extremely useful but it is not an optimal representation for text categorization because this unsupervised method ignores class discrimination while only concentrating on representation. Some local LSI methods have been proposed to improve the classification by utilizing class discrimination information. In this paper, we choose support vector machine (SVM) to generate imbalanced dataset as the local regions for local LSA. Experimental results show that our method is better than global LSA and traditional local LSA methods on classification within a much smaller LSA dimension.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nickerson, A., Japkowicz, N., Milios: Using unsupervised learning to guide re-sampling in imbalanced data sets. In: Proceedings of the Eighth International Workshop on AI and Statistics, pp. 261–265

    Google Scholar 

  2. Liu, A.Y.C.: The effect of oversampling and undersampling onclassifying imbalanced text datasets. Masters thesis. University of Texas at Austin (2004)

    Google Scholar 

  3. Tan, s.: An effective refinement strategy for KNN text classifier. Expert Systems with Applications 30, 290–298 (2006)

    Article  Google Scholar 

  4. Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explorations Newsletter 6(1), 80–89 (2004)

    Article  Google Scholar 

  5. Sun, A., Lim, E.-P.: On strategies for imbalanced text classification using SVM: A comparative study. Decision Support Systems 48, 191–201 (2009)

    Article  Google Scholar 

  6. Liu, Y., Loh, H.T.: Imbalanced text classification: A term weighting approach. Expert Systems with Applications 36, 690–701 (2009)

    Article  Google Scholar 

  7. Yany, Y.: Noise reduction in a statistical approach to text categorization. In: Proc. of the 18th ACM International Conference on Rexorch ond Development in Information Retrieval, New York, pp. 256–263 (1995)

    Google Scholar 

  8. Liu, T., Chen, Z.: Improving Text Classification using Local Latent Semantic Indexing. In: Fourth IEEE International Conference on Data Mining, ICDM 2004, pp. 162–169 (2004)

    Google Scholar 

  9. Vapnik, V., Golowich, S., Smola, A.: Support vector method for function approximation, regression estimation, and signal Processing. In: Neural Information Processing Systems, pp. 281–287 (September 1997)

    Google Scholar 

  10. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41, 391–407 (1990)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wan, Y., Tong, H., Deng, Y. (2011). Local Latent Semantic Analysis Based on Support Vector Machine for Imbalanced Text Categorization. In: Zhang, J. (eds) Applied Informatics and Communication. ICAIC 2011. Communications in Computer and Information Science, vol 226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23235-0_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23235-0_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23234-3

  • Online ISBN: 978-3-642-23235-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics