ABSTRACT
Co-occurrence data is quite common in many real applications. Latent Semantic Analysis (LSA) has been successfully used to identify semantic relations in such data. However, LSA can only handle a single co-occurrence relationship between two types of objects. In practical applications, there are many cases where multiple types of objects exist and any pair of these objects could have a pairwise co-occurrence relation. All these co-occurrence relations can be exploited to alleviate data sparseness or to represent objects more meaningfully. In this paper, we propose a novel algorithm, M-LSA, which conducts latent semantic analysis by incorporating all pairwise co-occurrences among multiple types of objects. Based on the mutual reinforcement principle, M-LSA identifies the most salient concepts among the co-occurrence data and represents all the objects in a unified semantic space. M-LSA is general and we show that several variants of LSA are special cases of our algorithm. Experiment results show that M-LSA outperforms LSA on multiple applications, including collaborative filtering, text clustering, and text categorization.
- R. K. Ando. Latent semantic-space: iterative scaling improves precision of inter-document similarity measurement. In Proceedings of the 23th SIGIR, pages 216--223, 2000. Google ScholarDigital Library
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. Google ScholarDigital Library
- B. T. Bartell, G. W. Cottrell, and R. K. Belew. Latent semantic indexing is an optimal special case of multidimensional scaling. In SIGIR, pages 161--167, 1992. Google ScholarDigital Library
- H. Bast and D. Majumdar. Why spectral retrieval works. In SIGIR, pages 11--18, 2005. Google ScholarDigital Library
- R. Bekkerman, R. El-Yaniv, and A. McCallum. Multi-way distributional clustering via pairwise interactions. In ICML, 2005. Google ScholarDigital Library
- M. Berry, S. Dumais, and G. O'Brien. Using linear algebra for intelligent information retrieval. SIAM Review, 37(4):573--595, 1995. Google ScholarDigital Library
- J. S. Breese, D. Heckerman, and C. M. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In UAI, pages 43--52, 1998. Google ScholarDigital Library
- J. K. Cullum and R. A. Willoughby. Lanczos Algorithms for Large Symmetric Eigenvalue Computation, Volumn 1 Theory. Birkhäuser, Boston, 1985. Google ScholarDigital Library
- B. D. Davison. Toward a unification of text and link analysis. In SIGIR, pages 367--368, 2003. Google ScholarDigital Library
- S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science and Technology (JASIS), 41(6):391--407, 1990.Google Scholar
- C. H. Q. Ding. A probabilistic model for latent semantic indexing. JASIST, 56(6):597--608, 2005. Google ScholarDigital Library
- G. H. Golub and C. F. V. Loan. Matrix Computations, third edition. The Johns Hopkins University Press, 1996. Google ScholarDigital Library
- Y. Gong and X. Liu. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th SIGIR, pages 19--25, 2001. Google ScholarDigital Library
- J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl. An algorithmic framework for performing collaborative filtering. In SIGIR, pages 230--237, 1999. Google ScholarDigital Library
- T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999. Google ScholarDigital Library
- G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In KDD, 2002. Google ScholarDigital Library
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999. Google ScholarDigital Library
- L. D. Lathauwer, B. D. Moor, and J. Wandewalle. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21(4):1253--1278, 2000. Google ScholarDigital Library
- D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788--791, 1999.Google ScholarCross Ref
- F. Monay and D. Gatica-Perez. On image auto-annotation with latent space models. In ACM Multimedia, pages 275--278, 2003. Google ScholarDigital Library
- C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. J. Comput. Syst. Sci., 61(2):217--235, 2000. Google ScholarDigital Library
- A. Popescul, L. H. Ungar, D. M. Pennock, and S. Lawrence. Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. In UAI, pages 437--444, 2001. Google ScholarDigital Library
- M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In Technical Report 00-034. Department of Computer Science and Engineering, University of Minnesota, 2000.Google Scholar
- W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. Simfusion: measuring similarity using unified relationship matrix. In SIGIR, pages 130--137, 2005. Google ScholarDigital Library
- W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In SIGIR, pages 267--273, 2003. Google ScholarDigital Library
- Y. Yang and X. Liu. A re-examination of text categorization methods. In SIGIR, pages 42--49, 1999. Google ScholarDigital Library
- H. Zha. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In SIGIR, pages 113--120, 2002. Google ScholarDigital Library
Index Terms
- Latent semantic analysis for multiple-type interrelated data objects
Recommendations
Latent semantic rational kernels for topic spotting on conversational speech
In this work, we propose latent semantic rational kernels (LSRK) for topic spotting on conversational speech. Rather than mapping the input weighted finite-state transducers (WFSTs) onto a high dimensional n-gram feature space as in n-gram rational ...
Automatic summarization for chinese text using affinity propagation clustering and latent semantic analysis
WISM'12: Proceedings of the 2012 international conference on Web Information Systems and MiningAs the rapid development of the internet, we can collect more and more information. it also means we need the abitily to search the information which really useful to us from the amount of information quickly. Automatic summarization is useful to us for ...
Update Summarization Based on Latent Semantic Analysis
TSD '09: Proceedings of the 12th International Conference on Text, Speech and DialogueThis paper deals with our recent research in text summarization. We went from single-document summarization through multi-document summarization to update summarization. We describe the development of our summarizer which is based on latent semantic ...
Comments