Skip to main content

Text Retrieval Using Sparsified Concept Decomposition Matrix

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3314))

Abstract

We examine text retrieval strategies using the sparsified concept decomposition matrix. The centroid vector of a tightly structured text collection provides a general description of text documents in that collection. The union of the centroid vectors forms a concept matrix. The original text data matrix can be projected into the concept space spanned by the concept vectors. We propose a procedure to conduct text retrieval based on the sparsified concept decomposition (SCD) matrix. Our experimental results show that text retrieval based on SCD may enhance the retrieval accuracy and reduce the storage cost, compared with the popular text retrieval technique based on latent semantic indexing with singular value decomposition.

The research work of the authors was supported in part by the U.S. National Science Foundation under grants CCR-0092532, and ACR-0202934, and in part by the U.S. Department of Energy Office of Science under grant DE-FG02-02ER45961.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bassu, D., Behrens, C.: Distributed LSI: scalable concept-based information retrieval with high semantic resolution. In: Proceedings of the 2003 Text Mining Workshop, San Francisco, CA, pp. 72–82 (2003)

    Google Scholar 

  2. Berry, M.W., Drmac, Z., Jessup, E.R.: Matrix, vector space, and information retrieval. SIAM Rev. 41, 335–362 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  3. Deerwester, S., Dumais, S.T., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Amer. Soc. Infor. Sci. 41, 391–407 (1990)

    Article  Google Scholar 

  4. Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Machine Learning 42, 143–175 (2001)

    Article  MATH  Google Scholar 

  5. Gao, J., Zhang, J.: Clustered SVD strategies in latent semantic indexing. Technical Report No. 382-03, Department of Computer Science, University of Kentucky, Lexington, KY (2003)

    Google Scholar 

  6. Gao, J., Zhang, J.: Sparsification strategies in latent semantic indexing. In: Proceedings of the 2003 Text Mining Workshop, San Francisco, CA, pp. 93–103 (2003)

    Google Scholar 

  7. Husbands, P., Simon, H., Ding, C.: On the use of singular value decomposition for text retrieval. In: Computational Information Retrieval, pp. 145–156. SIAM, Philadelphia (2001)

    Google Scholar 

  8. Jain, A., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1998)

    Google Scholar 

  9. Sasaki, M., Kita, K.: Information retrieval system using concept projection based on PDDP algorithm. In: Pacific Association for Computational Linguistics (PACLING 2001), pp. 243–249 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gao, J., Zhang, J. (2004). Text Retrieval Using Sparsified Concept Decomposition Matrix. In: Zhang, J., He, JH., Fu, Y. (eds) Computational and Information Science. CIS 2004. Lecture Notes in Computer Science, vol 3314. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30497-5_82

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30497-5_82

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24127-0

  • Online ISBN: 978-3-540-30497-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics