Article

Predictive discrete latent factor models for large scale dyadic data

Authors:

Deepak Agarwal,

Srujana MeruguAuthors Info & Claims

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 26 - 35

https://doi.org/10.1145/1281192.1281199

Published: 12 August 2007 Publication History

Abstract

We propose a novel statistical method to predict large scale dyadic response variables in the presence of covariate information. Our approach simultaneously incorporates the effect of covariates and estimates local structure that is induced by interactions among the dyads through a discrete latent factor model. The discovered latent factors provide a redictive model that is both accurate and interpretable. We illustrate our method by working in a framework of generalized linear models, which include commonly used regression techniques like linear regression, logistic regression and Poisson regression as special cases. We also provide scalable generalized EM-based algorithms for model fitting using both "hard" and "soft" cluster assignments. We demonstrate the generality and efficacy of our approach through large scale simulation studies and analysis of datasets obtained from certain real-world movie recommendation and internet advertising applications.

Supplementary Material

JPG File (p26-agarwal-200.jpg)

Download
8.06 KB

JPG File (p26-agarwal-768.jpg)

Download
10.06 KB

Low Resolution (p26-agarwal-200.mov)

Download
38.89 MB

High Resolution (p26-agarwal-768.mov)

Download
128.14 MB

References

[1]

M. Aitkin. A general maximum likelihood analysis of overdispersion in generalized linear models. Journal of Statistics and Computing, 6(3):1573--1375, September 1996.

Digital Library

[2]

A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. Modha. A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. JMLR, 2007. to appear.

Digital Library

[3]

A. Banerjee, S. Merugu, I. Dhillon, and J. Ghosh. Clustering with Bregman divergences. JMLR, 6:1705--1749, 2005.

Digital Library

[4]

D. Chakrabarti, S. Papadimitriou, D. Modha, and C. Faloutsos. Fully automatic cross-associations. In KDD, 2004.

Digital Library

[5]

D. Chickering, D. Heckerman, C. Meek, J. C. Platt, and B. Thiesson. Targeted internet advertising using predictive clustering and linear programming. http://research.microsoft.com/meek/papers/goal-oriented.ps.

[6]

I. Dhillon, S. Mallela, and D. Modha. Information-theoretic co-clustering. In KDD, 2003.

Digital Library

[7]

C. Fernandez and P. J. Green. Modelling spatially correlated data via mixtures: a Bayesian approach. Journal of Royal Statistics Society Series B, (4):805--826, 2002.

[8]

G. Golub and C. Loan. Matrix Computations. John Hopkins University Press, Baltimore, MD., 1989.

[9]

Movielens data set. http://www.cs.umn.edu/Research/GroupLens/data/ml-data.tar.gz.

[10]

A. Gunawardana and W. Byrne. Convergence theorems for generalized alternating minimization procedures. JMLR, 6:2049--2073, 2005.

Digital Library

[11]

P. Hoff, A. Raftery, and M. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97:1090--1098, 2002.

[12]

T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual ACM Conference on Research and Development in Information Retrieval, pages 50--57, Berkeley, California, August 1999.

Digital Library

[13]

D. L. Lee and S. Seung. Algorithms for non-negative matrix factorization. In NIPS, pages 556--562, 2001.

Digital Library

[14]

B. Long, X. Wu, Z. Zhang, and P. S. Yu. Unsupervised learning on k-partite graphs. In KDD, 2006.

Digital Library

[15]

S. C. Madeira and A. L. Oliveira. Biclustering algorithms for biological data analysis: A survey. IEEE Trans. Computational Biology and Bioinformatics, 1(1):24--45, 2004.

Digital Library

[16]

P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman & Hall/CRC, 1989.

[17]

S. Merugu. Distributed Learning using Generative Models. PhD thesis, Dept. of ECE, Univ. of Texas at Austin, 2006.

Digital Library

[18]

T. M. Mitchell. Machine Learning. McGraw-Hill Intl, 1997.

Digital Library

[19]

R. Neal and G. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models, pages 355--368. MIT Press, 1998.

[20]

K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455):1077--1087, 2001.

[21]

M. Pazzani. A framework for collaborative, content-based and demographic filtering. Artificial Intelligence Review, (5-6):393--408, 1999.

Digital Library

[22]

J. Rasbash and H. Goldstein. Efficient analysis of mixed hierarchical and cross-classified random structures using a multilevel model. Journal of Educational Statistics, (4):337--350, 1994.

[23]

P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm, and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews. In Proceedings of the ACM Conference on CSCW, pages 175--186, 1994.

Digital Library

Cited By

Boutalbi RLabiod LNadif M(2023)Latent Block Regression ModelClassification and Data Science in the Digital Age10.1007/978-3-031-09034-9_9(73-81)Online publication date: 8-Dec-2023
https://doi.org/10.1007/978-3-031-09034-9_9
Liu ZWang HChen WWang LLi T(2022)Bilateral discriminative autoencoder model orienting co-representation learningKnowledge-Based Systems10.1016/j.knosys.2022.108653245(108653)Online publication date: Jun-2022
https://doi.org/10.1016/j.knosys.2022.108653
Yang WWu LLiu XFan C(2018)MFCC: An Efficient and Effective Matrix Factorization Model Based on Co-clusteringInternet Multimedia Computing and Service10.1007/978-981-10-8530-7_35(360-370)Online publication date: 1-Mar-2018
https://doi.org/10.1007/978-981-10-8530-7_35
Show More Cited By

Index Terms

Predictive discrete latent factor models for large scale dyadic data
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation theory
      1. Systems theory
2. Mathematics of computing
  1. Information theory

Recommendations

Mining for the most certain predictions from dyadic data
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

In several applications involving regression or classification, along with making predictions it is important to assess how accurate or reliable individual predictions are. This is particularly important in cases where due to finite resources or domain ...
A semiparametric latent factor model for large scale temporal data with heteroscedasticity
Abstract
Large scale temporal data have flourished in a vast array of applications, and their sophisticated structures, especially the heteroscedasticity among subjects with inter- and intra-temporal dependence, have fueled a great demand for ...
Functional Latent Factor Model

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2007

1080 pages

ISBN:9781595936097

DOI:10.1145/1281192

General Chair:
Pavel Berkhin
Yahoo!, USA
,
Program Chairs:
Rich Caruana
Cornell University, USA
,
Xindong Wu
University of Vermont, USA

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD07

Sponsor:

KDD07: The 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 12 - 15, 2007

California, San Jose, USA

Acceptance Rates

KDD '07 Paper Acceptance Rate 111 of 573 submissions, 19%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

44
Total Citations
View Citations
1,898
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Boutalbi RLabiod LNadif M(2023)Latent Block Regression ModelClassification and Data Science in the Digital Age10.1007/978-3-031-09034-9_9(73-81)Online publication date: 8-Dec-2023
https://doi.org/10.1007/978-3-031-09034-9_9
Liu ZWang HChen WWang LLi T(2022)Bilateral discriminative autoencoder model orienting co-representation learningKnowledge-Based Systems10.1016/j.knosys.2022.108653245(108653)Online publication date: Jun-2022
https://doi.org/10.1016/j.knosys.2022.108653
Yang WWu LLiu XFan C(2018)MFCC: An Efficient and Effective Matrix Factorization Model Based on Co-clusteringInternet Multimedia Computing and Service10.1007/978-981-10-8530-7_35(360-370)Online publication date: 1-Mar-2018
https://doi.org/10.1007/978-981-10-8530-7_35
Deodhar MGhosh JSaar-Tsechansky MKeshari V(2017)Active Learning with Multiple Localized Regression ModelsINFORMS Journal on Computing10.1287/ijoc.2016.073229:3(503-522)Online publication date: Aug-2017
https://doi.org/10.1287/ijoc.2016.0732
Anggistia MSaefuddin ASartono B(2017)Simultaneous Co-Clustering and Classification in Customers InsightJournal of Physics: Conference Series10.1088/1742-6596/824/1/012033824(012033)Online publication date: 18-Apr-2017
https://doi.org/10.1088/1742-6596/824/1/012033
Yang LHuang WNiu X(2017)Defending shilling attacks in recommender systems using soft co-clusteringIET Information Security10.1049/iet-ifs.2016.034511:6(319-325)Online publication date: 1-Nov-2017
https://doi.org/10.1049/iet-ifs.2016.0345
Wang T(2017)A group interest-based collaborative filtering algorithm for multimedia informationMultimedia Tools and Applications10.1007/s11042-017-5516-x77:4(4401-4415)Online publication date: 24-Dec-2017
https://doi.org/10.1007/s11042-017-5516-x
Xu YYu QLam WLin T(2017)Exploiting interactions of review text, hidden user communities and item groups, and time for collaborative filteringKnowledge and Information Systems10.1007/s10115-016-1005-152:1(221-254)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1007/s10115-016-1005-1
Huang SWang HLi TYang YLi T(2016)Constraint Co-Projections for Semi-Supervised Co-ClusteringIEEE Transactions on Cybernetics10.1109/TCYB.2015.249617446:12(3047-3058)Online publication date: Dec-2016
https://doi.org/10.1109/TCYB.2015.2496174
Wang FWang GLin SYu P(2015)Concurrent goal-oriented co-clustering generation in social networksProceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)10.1109/ICOSC.2015.7050833(350-357)Online publication date: Feb-2015
https://doi.org/10.1109/ICOSC.2015.7050833
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten