skip to main content
10.1145/1148170.1148214acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Latent semantic analysis for multiple-type interrelated data objects

Published:06 August 2006Publication History

ABSTRACT

Co-occurrence data is quite common in many real applications. Latent Semantic Analysis (LSA) has been successfully used to identify semantic relations in such data. However, LSA can only handle a single co-occurrence relationship between two types of objects. In practical applications, there are many cases where multiple types of objects exist and any pair of these objects could have a pairwise co-occurrence relation. All these co-occurrence relations can be exploited to alleviate data sparseness or to represent objects more meaningfully. In this paper, we propose a novel algorithm, M-LSA, which conducts latent semantic analysis by incorporating all pairwise co-occurrences among multiple types of objects. Based on the mutual reinforcement principle, M-LSA identifies the most salient concepts among the co-occurrence data and represents all the objects in a unified semantic space. M-LSA is general and we show that several variants of LSA are special cases of our algorithm. Experiment results show that M-LSA outperforms LSA on multiple applications, including collaborative filtering, text clustering, and text categorization.

References

  1. R. K. Ando. Latent semantic-space: iterative scaling improves precision of inter-document similarity measurement. In Proceedings of the 23th SIGIR, pages 216--223, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. T. Bartell, G. W. Cottrell, and R. K. Belew. Latent semantic indexing is an optimal special case of multidimensional scaling. In SIGIR, pages 161--167, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Bast and D. Majumdar. Why spectral retrieval works. In SIGIR, pages 11--18, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Bekkerman, R. El-Yaniv, and A. McCallum. Multi-way distributional clustering via pairwise interactions. In ICML, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Berry, S. Dumais, and G. O'Brien. Using linear algebra for intelligent information retrieval. SIAM Review, 37(4):573--595, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. S. Breese, D. Heckerman, and C. M. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In UAI, pages 43--52, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. K. Cullum and R. A. Willoughby. Lanczos Algorithms for Large Symmetric Eigenvalue Computation, Volumn 1 Theory. Birkhäuser, Boston, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. D. Davison. Toward a unification of text and link analysis. In SIGIR, pages 367--368, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science and Technology (JASIS), 41(6):391--407, 1990.Google ScholarGoogle Scholar
  11. C. H. Q. Ding. A probabilistic model for latent semantic indexing. JASIST, 56(6):597--608, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. H. Golub and C. F. V. Loan. Matrix Computations, third edition. The Johns Hopkins University Press, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Gong and X. Liu. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th SIGIR, pages 19--25, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl. An algorithmic framework for performing collaborative filtering. In SIGIR, pages 230--237, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In KDD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. D. Lathauwer, B. D. Moor, and J. Wandewalle. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21(4):1253--1278, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788--791, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  20. F. Monay and D. Gatica-Perez. On image auto-annotation with latent space models. In ACM Multimedia, pages 275--278, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. J. Comput. Syst. Sci., 61(2):217--235, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Popescul, L. H. Ungar, D. M. Pennock, and S. Lawrence. Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. In UAI, pages 437--444, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In Technical Report 00-034. Department of Computer Science and Engineering, University of Minnesota, 2000.Google ScholarGoogle Scholar
  24. W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. Simfusion: measuring similarity using unified relationship matrix. In SIGIR, pages 130--137, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In SIGIR, pages 267--273, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Yang and X. Liu. A re-examination of text categorization methods. In SIGIR, pages 42--49, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Zha. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In SIGIR, pages 113--120, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Latent semantic analysis for multiple-type interrelated data objects

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
        August 2006
        768 pages
        ISBN:1595933697
        DOI:10.1145/1148170

        Copyright © 2006 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 August 2006

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader