Skip to main content

Robust Discriminant Analysis of Latent Semantic Feature for Text Categorization

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4223))

Abstract

This paper proposes a Discriminative Semantic Feature (DSF) method for vector space model based text categorization. The DSF method, which involves two stages, first reduces the dimension of the document vector space by Latent Semantic Indexing (LSI), and then applies a Robust linear Discriminant analysis Model (RDM), which improves the classical LDA by a energy-adaptive regularization criteria, to extract the discriminative semantic feature with enhanced discrimination power. As a result, DSF method can not only uncover latent semantic structure but also capture the discriminative feature. Comparative experiments on various state-of-art dimension reduction schemes such as our DSF, LSI, orthogonal centroid, two-stage LSI+LDA, LDA/QR and LDA/GSVD, are also performed. Experiments using the Reuters-21578 text collection show the proposed method performs better than other algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  2. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)

    Article  Google Scholar 

  3. Duda, R.O., Hart, P.E., Stork, D.: Pattern Classification. Wiley, Chichester (2000)

    Google Scholar 

  4. Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, London (1990)

    MATH  Google Scholar 

  5. Howland, P., Park, H.: Generalizing Discriminant Analysis Using the Generalized Singular Value Decomposition. IEEE Trans. Pattern Anal. Machine Intell. 26, 995–1006 (2004)

    Article  Google Scholar 

  6. Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer, Dordrecht (2002)

    Google Scholar 

  7. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)

    Article  Google Scholar 

  8. Lewis, D.D.: Reuters-21578 text categorization test collection http://www.daviddlewis.com/resources/testcollections/reuters21578/

  9. Porter, M.F.: An Algorithm for Suffix Stripping. Program 14, 130–137 (1980)

    Google Scholar 

  10. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)

    Article  MATH  Google Scholar 

  11. Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24, 513–523 (1988)

    Article  Google Scholar 

  12. Torkkola, K.: Linear discriminant analysis in document classification. In: IEEE International Conference on Data Mining (ICDM) Workshop on Text Mining (2001)

    Google Scholar 

  13. Thomaz, C.E., Gillies, D.F., Feitosa, R.Q.: A New Covariance Estimate for Bayesian Classifier in Biometric Recognition. IEEE Trans. CSVT 14, 214–223 (2004)

    Google Scholar 

  14. Ye, J., Li, Q.: A Two-Stage Linear Discriminant Analysis via QR-Decomposition. IEEE Trans. Pattern Anal. Machine Intell. 27, 929–941 (2005)

    Article  Google Scholar 

  15. Zhao, Y., Karypis, G.: Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning 55, 311–331 (2004)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hu, J., Deng, W., Guo, J. (2006). Robust Discriminant Analysis of Latent Semantic Feature for Text Categorization. In: Wang, L., Jiao, L., Shi, G., Li, X., Liu, J. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2006. Lecture Notes in Computer Science(), vol 4223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11881599_46

Download citation

  • DOI: https://doi.org/10.1007/11881599_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45916-3

  • Online ISBN: 978-3-540-45917-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics