skip to main content
10.1145/1321440.1321544acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Regularized locality preserving indexing via spectral regression

Published: 06 November 2007 Publication History

Abstract

We consider the problem of document indexing and representation. Recently, Locality Preserving Indexing (LPI) was proposed for learning a compact document subspace. Different from Latent Semantic Indexing (LSI) which is optimal in the sense of global Euclidean structure, LPI is optimal in the sense of local manifold structure. However, LPI is not efficient in time and memory which makes it difficult to be applied to very large data set. Specifically, the computation of LPI involves eigen-decompositions of two dense matrices which is expensive. In this paper, we propose a new algorithm called Regularized Locality Preserving Indexing (RLPI). Benefit from recent progresses on spectral graph analysis, we cast the original LPI algorithm into a regression framework which enable us to avoid eigen-decomposition of dense matrices. Also, with the regression based framework, different kinds of regularizers can be naturally incorporated into our algorithm which makes it more flexible. Extensive experimental results show that RLPI obtains similar or better results comparing to LPI and it is significantly faster, which makes it an efficient and effective data preprocessing method for large scale text clustering, classification and retrieval.

References

[1]
R. Ando. Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement. In Proc. 2000 Int. Conf. on Research and Development in Information Retrieval (SIGIR'00), Athens, Greece, July 2000.
[2]
T. Bartell, G. W. Cottrell, and R. K. Belew. Latent semantic indexing is an optimal special case of multidimensional scaling. In Proc. 1992 Int. Conf. on Research and Development in Information Retrieval (SIGIR'92), pages 161--167, Copenhagen, Denmark, June 1992.
[3]
D. Cai and X. He. Orthogonal locality preserving indexing. In Proc. International Conference on Research and Development in Information Retrieval (SIGIR'05), pages 3--10, Salvador, Brazil, 2005.
[4]
D. Cai, X. He, and J. Han. Document clustering using locality preserving indexing. IEEE Transactions on Knowledge and Data Engineering, 17(12):1624--1637, December 2005.
[5]
D. Cai, X. He, and J. Han. Spectral regression for dimensionality reduction. Technical report, Computer Science Department, UIUC, UIUCDCS-R-2007-2856, May 2007.
[6]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[7]
F. R. K. Chung. Spectral Graph Theory, volume 92 of Regional Conference Series in Mathematics. AMS, 1997.
[8]
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.
[9]
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience, Hoboken, NJ, 2nd edition, 2000.
[10]
G. H. Golub and C. F. V. Loan. Matrix computations. Johns Hopkins University Press, 3rd edition, 1996.
[11]
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer-Verlag, 2001.
[12]
X. He, D. Cai, H. Liu, and W.-Y. Ma. Locality preserving indexing for document representation. In Proc. 2004 Int. Conf. on Research and Development in Information Retrieval (SIGIR'04), pages 96--103, Sheffield, UK, July 2004.
[13]
T. Hofmann. Probabilistic latent semantic indexing. In Proc. 1999 Int. Conf. on Research and Development in Information Retrieval (SIGIR'99), pages 50--57, Berkeley, CA, Aug. 1999.
[14]
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In ECML'98, 1998.
[15]
K. Lang. Newsweeder: Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning, pages 331--339, 1995.
[16]
L. Lovasz and M. Plummer. Matching Theory. Akadémiai Kiadó, North Holland, Budapest, 1986.
[17]
C. C. Paige and M. A. Saunders. Algorithm 583 LSQR: Sparse linear equations and least squares problems. ACM Transactions on Mathematical Software, 8(2):195--209, June 1982.
[18]
C. C. Paige and M. A. Saunders. LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Transactions on Mathematical Software, 8(1):43--71, March 1982.
[19]
C. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: a probabilistic analysis. In Proc. 17th ACM Symp. Principles of Database Systems, Seattle, WA, June 1998.
[20]
R. Penrose. A generalized inverse for matrices. In Proceedings of the Cambridge Philosophical Society, volume 51, pages 406--413, 1955.
[21]
G. W. Stewart. Matrix Algorithms Volume I: Basic Decompositions. SIAM, 1998.
[22]
G. W. Stewart. Matrix Algorithms Volume II: Eigensystems. SIAM, 2001.
[23]
W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In Proc. 2003 Int. Conf. on Research and Development in Information Retrieval (SIGIR'03), pages 267--273, Toronto, Canada, Aug. 2003.
[24]
Y. Yang and X. Liu. A re-examination of text categorization methods. In SIGIR'99, 1999.

Cited By

View all
  • (2024)Fast and Robust Sparsity-Aware Block Diagonal RepresentationIEEE Transactions on Signal Processing10.1109/TSP.2023.334356572(305-320)Online publication date: 2024
  • (2024)Clustering Ensemble via Diffusion on Adaptive MultiplexIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331140936:4(1463-1474)Online publication date: Apr-2024
  • (2024)Robust Regularized Locality Preserving Indexing for Fiedler Vector EstimationIEEE Open Journal of Signal Processing10.1109/OJSP.2024.34006835(867-885)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
November 2007
1048 pages
ISBN:9781595938039
DOI:10.1145/1321440
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dimensionality reduction
  2. document representation and indexing
  3. regularized locality preserving indexing

Qualifiers

  • Research-article

Conference

CIKM07

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Fast and Robust Sparsity-Aware Block Diagonal RepresentationIEEE Transactions on Signal Processing10.1109/TSP.2023.334356572(305-320)Online publication date: 2024
  • (2024)Clustering Ensemble via Diffusion on Adaptive MultiplexIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331140936:4(1463-1474)Online publication date: Apr-2024
  • (2024)Robust Regularized Locality Preserving Indexing for Fiedler Vector EstimationIEEE Open Journal of Signal Processing10.1109/OJSP.2024.34006835(867-885)Online publication date: 2024
  • (2023)Adaptive Consensus Clustering for Multiple K-Means Via Base Results RefiningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.326497035:10(10251-10264)Online publication date: 1-Oct-2023
  • (2023)Bi-level ensemble method for unsupervised feature selectionInformation Fusion10.1016/j.inffus.2023.101910100(101910)Online publication date: Dec-2023
  • (2021)Robust Spectral Clustering: A Locality Preserving Feature Mapping Based on M-estimation2021 29th European Signal Processing Conference (EUSIPCO)10.23919/EUSIPCO54536.2021.9616292(851-855)Online publication date: 23-Aug-2021
  • (2021)Feature Extraction Using Multidimensional Spectral Regression Whitening for Hyperspectral Image ClassificationIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2021.310415314(8326-8340)Online publication date: 2021
  • (2021)Quartic First-Order Methods for Low-Rank MinimizationJournal of Optimization Theory and Applications10.1007/s10957-021-01820-3Online publication date: 23-Mar-2021
  • (2020)An Improved Quantum Algorithm for Spectral Regression2020 Asia Conference on Computers and Communications (ACCC)10.1109/ACCC51160.2020.9347936(11-15)Online publication date: 18-Sep-2020
  • (2020)A randomized generalized low rank approximations of matrices algorithm for high dimensionality reduction and image compressionNumerical Linear Algebra with Applications10.1002/nla.233828:1Online publication date: 30-Sep-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media