skip to main content
10.1145/1281192.1281199acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Predictive discrete latent factor models for large scale dyadic data

Published:12 August 2007Publication History

ABSTRACT

We propose a novel statistical method to predict large scale dyadic response variables in the presence of covariate information. Our approach simultaneously incorporates the effect of covariates and estimates local structure that is induced by interactions among the dyads through a discrete latent factor model. The discovered latent factors provide a redictive model that is both accurate and interpretable. We illustrate our method by working in a framework of generalized linear models, which include commonly used regression techniques like linear regression, logistic regression and Poisson regression as special cases. We also provide scalable generalized EM-based algorithms for model fitting using both "hard" and "soft" cluster assignments. We demonstrate the generality and efficacy of our approach through large scale simulation studies and analysis of datasets obtained from certain real-world movie recommendation and internet advertising applications.

Skip Supplemental Material Section

Supplemental Material

p26-agarwal-200.mov

mov

38.9 MB

p26-agarwal-768.mov

mov

128.1 MB

References

  1. M. Aitkin. A general maximum likelihood analysis of overdispersion in generalized linear models. Journal of Statistics and Computing, 6(3):1573--1375, September 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. Modha. A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. JMLR, 2007. to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Banerjee, S. Merugu, I. Dhillon, and J. Ghosh. Clustering with Bregman divergences. JMLR, 6:1705--1749, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Chakrabarti, S. Papadimitriou, D. Modha, and C. Faloutsos. Fully automatic cross-associations. In KDD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Chickering, D. Heckerman, C. Meek, J. C. Platt, and B. Thiesson. Targeted internet advertising using predictive clustering and linear programming. http://research.microsoft.com/meek/papers/goal-oriented.ps.Google ScholarGoogle Scholar
  6. I. Dhillon, S. Mallela, and D. Modha. Information-theoretic co-clustering. In KDD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Fernandez and P. J. Green. Modelling spatially correlated data via mixtures: a Bayesian approach. Journal of Royal Statistics Society Series B, (4):805--826, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  8. G. Golub and C. Loan. Matrix Computations. John Hopkins University Press, Baltimore, MD., 1989.Google ScholarGoogle Scholar
  9. Movielens data set. http://www.cs.umn.edu/Research/GroupLens/data/ml-data.tar.gz.Google ScholarGoogle Scholar
  10. A. Gunawardana and W. Byrne. Convergence theorems for generalized alternating minimization procedures. JMLR, 6:2049--2073, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Hoff, A. Raftery, and M. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97:1090--1098, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  12. T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual ACM Conference on Research and Development in Information Retrieval, pages 50--57, Berkeley, California, August 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. L. Lee and S. Seung. Algorithms for non-negative matrix factorization. In NIPS, pages 556--562, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Long, X. Wu, Z. Zhang, and P. S. Yu. Unsupervised learning on k-partite graphs. In KDD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. C. Madeira and A. L. Oliveira. Biclustering algorithms for biological data analysis: A survey. IEEE Trans. Computational Biology and Bioinformatics, 1(1):24--45, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman & Hall/CRC, 1989.Google ScholarGoogle Scholar
  17. S. Merugu. Distributed Learning using Generative Models. PhD thesis, Dept. of ECE, Univ. of Texas at Austin, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. M. Mitchell. Machine Learning. McGraw-Hill Intl, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Neal and G. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models, pages 355--368. MIT Press, 1998. Google ScholarGoogle ScholarCross RefCross Ref
  20. K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455):1077--1087, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  21. M. Pazzani. A framework for collaborative, content-based and demographic filtering. Artificial Intelligence Review, (5-6):393--408, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Rasbash and H. Goldstein. Efficient analysis of mixed hierarchical and cross-classified random structures using a multilevel model. Journal of Educational Statistics, (4):337--350, 1994.Google ScholarGoogle Scholar
  23. P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm, and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews. In Proceedings of the ACM Conference on CSCW, pages 175--186, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Predictive discrete latent factor models for large scale dyadic data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2007
        1080 pages
        ISBN:9781595936097
        DOI:10.1145/1281192

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 August 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        KDD '07 Paper Acceptance Rate111of573submissions,19%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader