skip to main content
10.1145/1081870.1081876acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Variable latent semantic indexing

Published: 21 August 2005 Publication History

Abstract

Latent Semantic Indexing is a classical method to produce optimal low-rank approximations of a term-document matrix. However, in the context of a particular query distribution, the approximation thus produced need not be optimal. We propose VLSI, a new query-dependent (or "variable") low-rank approximation that minimizes approximation error for any specified query distribution. With this tool, it is possible to tailor the LSI technique to particular settings, often resulting in vastly improved approximations at much lower dimensionality. We validate this method via a series of experiments on classical corpora, showing that VLSI typically performs similarly to LSI with an order of magnitude fewer dimensions.

References

[1]
R. Baeza-Yates. Web usage mining in search engines. In A. Scime, editor, Web Mining: Applications and Techniques, chapter XIV. Idea Group, 2004.
[2]
M. Berry, T. Do, G. O'Brien, V. Krishna, and S. Varadhan. SVDPACKC (version 1.0) user's guide. Technical Report CS-93-194, University of Tennessee, 1993.
[3]
S. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6):391--407, 1990.
[4]
S. T. Dumais. LSI meets TREC: A status report. In The First Text REtrieval Conference (TREC1), National Institute of Standards and Technology, pages 137--152, 1992.
[5]
S. T. Dumais. Latent semantic indexing (LSI) and TREC-2. In The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology, pages 105--116, 1993.
[6]
S. T. Dumais. Latent semantic indexing (LSI): TREC-3 report. In The Third Text REtrieval Conference (TREC3), National Institute of Standards and Technology, pages 105--115, 1994.
[7]
S. T. Dumais, G. Furnas, T. Landauer, and S. Deerwester. Using latent semantic analysis to improve information retrieval. In Proceedings of ACM Conference on Computer Human Interaction (CHI), pages 281--285, 1988.
[8]
C. Eckart and G. Young. The approximation of a matrix by another of lower rank. Psychometrika, 1:211--218, 1936.
[9]
G. H. Golub and C. F. V. Loan. Matrix Computation. John Hopkins University Press, 1991. Second Edition.
[10]
T. Hoffmann. Matrix decomposition techniques in machine learning and information retrieval. http://www.mpi-sb.mpg.de/~adfocs/adfocs04.slides-hofmann.pdf, 2004.
[11]
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd ACM Conference on Research and Development in Information Retrieval (SIGIR), pages 50--57, 1999.
[12]
T. Hofmann. Latent semantic models for collaborative filtering. ACM Transactions on Information Systems, 22(1):89--115, 2004.
[13]
I. T. Jolliffe. Principal Component Analysis. Springer, 2002.
[14]
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788--791, 1999.
[15]
R. Lempel and S. Moran. Predictive caching and prefetching of query results in search engines. In Proceedings of the 12th International World Wide Web Conference (WWW), pages 19--28, 2003.
[16]
D. D. Lewis. http://www.daviddlewis.com/resources/testcollections/reuters21578/.
[17]
S. Mika, B. Schölkopf, A. Smola, K.-R. Müller, M. Scholz, and G. Rätsch. Kernel PCA and de-noising in feature spaces. In Advances in Neural Information Processing Systems 11 (NIPS), pages 536--542, 1998.
[18]
B. Moghaddam and A. Pentland. Face recognition using view-based and modular eigenspaces. In Automatic Systems for the Identification and Inspection of Humans, SPIE, volume 2277, pages 12--21, 1994.
[19]
S. E. Robertson and S. Walker. Okapi/Keenbow at TREC-8. In The Eighth Text REtrieval Conference (TREC8), National Institute of Standards and Technology, 1999.
[20]
P. C. Saraiva, E. S. de Moura, N. Ziviani, W. Meira, R. Fonseca, and B. Riberio-Neto. Rank-preserving two-level caching for scalable search engines. In Proceedings of the 24th ACM Conference on Research and Development in Information Retrieval (SIGIR), pages 51--58, 2001.
[21]
A. Singhal. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 24(4):35--43, 2001.
[22]
P. K. C. Singitham, M. S. Mahabhashyam, and P. Raghavan. Efficiency-quality tradeoffs for vector score aggregation. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), pages 624--635, 2004.
[23]
N. Srebro and T. Jaakkola. Weighted low-rank approximations. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 720--727, 2003.
[24]
M. Tipping and C. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, Series B, 61(3):611--622, 1999.
[25]
J. Ye. Generalized low-rank approximations of matrices. In Proceedings of the 21st International Conference on Machine Learning (ICML), 2004.

Cited By

View all
  • (2016)Development and Research of the Text Messages Semantic Clustering Methodology2016 Third European Network Intelligence Conference (ENIC)10.1109/ENIC.2016.034(180-187)Online publication date: Sep-2016
  • (2016)Investigating the Optimise k-Dimensions and Threshold Values of Latent Semantic Indexing Retrieval Performance for Small Malay Language CorpusRegional Conference on Science, Technology and Social Sciences (RCSTSS 2014)10.1007/978-981-10-0534-3_31(325-336)Online publication date: 25-Mar-2016
  • (2006)Improved Approximation Algorithms for Large Matrices via Random ProjectionsProceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science10.1109/FOCS.2006.37(143-152)Online publication date: 21-Oct-2006

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
August 2005
844 pages
ISBN:159593135X
DOI:10.1145/1081870
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. LSI
  2. SVD
  3. VLSI
  4. linear algebra
  5. matrix approximation

Qualifiers

  • Article

Conference

KDD05

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Development and Research of the Text Messages Semantic Clustering Methodology2016 Third European Network Intelligence Conference (ENIC)10.1109/ENIC.2016.034(180-187)Online publication date: Sep-2016
  • (2016)Investigating the Optimise k-Dimensions and Threshold Values of Latent Semantic Indexing Retrieval Performance for Small Malay Language CorpusRegional Conference on Science, Technology and Social Sciences (RCSTSS 2014)10.1007/978-981-10-0534-3_31(325-336)Online publication date: 25-Mar-2016
  • (2006)Improved Approximation Algorithms for Large Matrices via Random ProjectionsProceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science10.1109/FOCS.2006.37(143-152)Online publication date: 21-Oct-2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media