Article

Latent semantic analysis for multiple-type interrelated data objects

Authors:
Xuanhui Wang

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign
View Profile

,
Jian-Tao Sun

Microsoft Research Asia, Beijing, P.R.China

Microsoft Research Asia, Beijing, P.R.China
View Profile

,
Zheng Chen

Microsoft Research Asia, Beijing, P.R.China

Microsoft Research Asia, Beijing, P.R.China
View Profile

,
ChengXiang Zhai

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign
View Profile

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrievalAugust 2006Pages 236–243https://doi.org/10.1145/1148170.1148214

Published:06 August 2006Publication History

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 236–243

ABSTRACT

Co-occurrence data is quite common in many real applications. Latent Semantic Analysis (LSA) has been successfully used to identify semantic relations in such data. However, LSA can only handle a single co-occurrence relationship between two types of objects. In practical applications, there are many cases where multiple types of objects exist and any pair of these objects could have a pairwise co-occurrence relation. All these co-occurrence relations can be exploited to alleviate data sparseness or to represent objects more meaningfully. In this paper, we propose a novel algorithm, M-LSA, which conducts latent semantic analysis by incorporating all pairwise co-occurrences among multiple types of objects. Based on the mutual reinforcement principle, M-LSA identifies the most salient concepts among the co-occurrence data and represents all the objects in a unified semantic space. M-LSA is general and we show that several variants of LSA are special cases of our algorithm. Experiment results show that M-LSA outperforms LSA on multiple applications, including collaborative filtering, text clustering, and text categorization.

References

R. K. Ando. Latent semantic-space: iterative scaling improves precision of inter-document similarity measurement. In Proceedings of the 23th SIGIR, pages 216--223, 2000. Google ScholarDigital Library
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. Google ScholarDigital Library
B. T. Bartell, G. W. Cottrell, and R. K. Belew. Latent semantic indexing is an optimal special case of multidimensional scaling. In SIGIR, pages 161--167, 1992. Google ScholarDigital Library
H. Bast and D. Majumdar. Why spectral retrieval works. In SIGIR, pages 11--18, 2005. Google ScholarDigital Library
R. Bekkerman, R. El-Yaniv, and A. McCallum. Multi-way distributional clustering via pairwise interactions. In ICML, 2005. Google ScholarDigital Library
M. Berry, S. Dumais, and G. O'Brien. Using linear algebra for intelligent information retrieval. SIAM Review, 37(4):573--595, 1995. Google ScholarDigital Library
J. S. Breese, D. Heckerman, and C. M. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In UAI, pages 43--52, 1998. Google ScholarDigital Library
J. K. Cullum and R. A. Willoughby. Lanczos Algorithms for Large Symmetric Eigenvalue Computation, Volumn 1 Theory. Birkhäuser, Boston, 1985. Google ScholarDigital Library
B. D. Davison. Toward a unification of text and link analysis. In SIGIR, pages 367--368, 2003. Google ScholarDigital Library
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science and Technology (JASIS), 41(6):391--407, 1990.Google Scholar
C. H. Q. Ding. A probabilistic model for latent semantic indexing. JASIST, 56(6):597--608, 2005. Google ScholarDigital Library
G. H. Golub and C. F. V. Loan. Matrix Computations, third edition. The Johns Hopkins University Press, 1996. Google ScholarDigital Library
Y. Gong and X. Liu. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th SIGIR, pages 19--25, 2001. Google ScholarDigital Library
J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl. An algorithmic framework for performing collaborative filtering. In SIGIR, pages 230--237, 1999. Google ScholarDigital Library
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999. Google ScholarDigital Library
G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In KDD, 2002. Google ScholarDigital Library
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999. Google ScholarDigital Library
L. D. Lathauwer, B. D. Moor, and J. Wandewalle. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21(4):1253--1278, 2000. Google ScholarDigital Library
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788--791, 1999.Google ScholarCross Ref
F. Monay and D. Gatica-Perez. On image auto-annotation with latent space models. In ACM Multimedia, pages 275--278, 2003. Google ScholarDigital Library
C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. J. Comput. Syst. Sci., 61(2):217--235, 2000. Google ScholarDigital Library
A. Popescul, L. H. Ungar, D. M. Pennock, and S. Lawrence. Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. In UAI, pages 437--444, 2001. Google ScholarDigital Library
M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In Technical Report 00-034. Department of Computer Science and Engineering, University of Minnesota, 2000.Google Scholar
W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. Simfusion: measuring similarity using unified relationship matrix. In SIGIR, pages 130--137, 2005. Google ScholarDigital Library
W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In SIGIR, pages 267--273, 2003. Google ScholarDigital Library
Y. Yang and X. Liu. A re-examination of text categorization methods. In SIGIR, pages 42--49, 1999. Google ScholarDigital Library
H. Zha. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In SIGIR, pages 113--120, 2002. Google ScholarDigital Library

Index Terms

Latent semantic analysis for multiple-type interrelated data objects
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Latent semantic rational kernels for topic spotting on conversational speech

In this work, we propose latent semantic rational kernels (LSRK) for topic spotting on conversational speech. Rather than mapping the input weighted finite-state transducers (WFSTs) onto a high dimensional n-gram feature space as in n-gram rational ...
Read More
Automatic summarization for chinese text using affinity propagation clustering and latent semantic analysis
WISM'12: Proceedings of the 2012 international conference on Web Information Systems and Mining

As the rapid development of the internet, we can collect more and more information. it also means we need the abitily to search the information which really useful to us from the amount of information quickly. Automatic summarization is useful to us for ...
Read More
Update Summarization Based on Latent Semantic Analysis
TSD '09: Proceedings of the 12th International Conference on Text, Speech and Dialogue

This paper deals with our recent research in text summarization. We went from single-document summarization through multi-document summarization to update summarization. We describe the development of our summarizer which is based on latent semantic ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
August 2006
768 pages
ISBN:1595933697
DOI:10.1145/1148170
General Chair:
Efthimis N. Efthimiadis
University of Washington
,
Program Chairs:
Susan Dumais
Microsoft Research, Redmond
,
David Hawking
CSIRO ICT Centre, Canberra, Australia
,
Kalervo Järvelin,
University of Tampere, Finland
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 August 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
LSA
M-LSA
multiple-type
mutual reinforcement principle
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 44
  Total Citations
  View Citations
- 1,267
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Latent semantic analysis for multiple-type interrelated data objects

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Latent semantic rational kernels for topic spotting on conversational speech

Automatic summarization for chinese text using affinity propagation clustering and latent semantic analysis

Update Summarization Based on Latent Semantic Analysis