skip to main content
10.1145/1553374.1553388acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Probabilistic dyadic data analysis with local and global consistency

Published: 14 June 2009 Publication History

Abstract

Dyadic data arises in many real world applications such as social network analysis and information retrieval. In order to discover the underlying or hidden structure in the dyadic data, many topic modeling techniques were proposed. The typical algorithms include Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). The probability density functions obtained by both of these two algorithms are supported on the Euclidean space. However, many previous studies have shown naturally occurring data may reside on or close to an underlying submanifold. We introduce a probabilistic framework for modeling both the topical and geometrical structure of the dyadic data that explicitly takes into account the local manifold structure. Specifically, the local manifold structure is modeled by a graph. The graph Laplacian, analogous to the Laplace-Beltrami operator on manifolds, is applied to smooth the probability density functions. As a result, the obtained probabilistic distributions are concentrated around the data manifold. Experimental results on real data sets demonstrate the effectiveness of the proposed approach.

References

[1]
Belkin, M., & Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems 14, 585--591. Cambridge, MA: MIT Press.
[2]
Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from examples. Journal of Machine Learning Research, 7, 2399--2434.
[3]
Blei, D., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation. Journal of machine Learning Research, 993--1022.
[4]
Cai, D., Mei, Q., Han, J., & Zhai, C. (2008). Modeling hidden topics on document manifold. CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge management (pp. 911--920).
[5]
Chung, F. R. K. (1997). Spectral graph theory, vol. 92 of Regional Conference Series in Mathematics. AMS.
[6]
Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41, 391--407.
[7]
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1--38.
[8]
Hofmann, T. (1999). Probabilistic latent semantic indexing. Proc. 1999 Int. Conf. on Research and Development in Information Retrieval (pp. 50--57). Berkeley, CA.
[9]
Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42, 177--196.
[10]
Hofmann, T., Puzicha, J., & Jordan, M. I. (1998). Learning from dyadic data. In Advances in neural information processing systems 11, 466--472.
[11]
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. European conference on machine learning (pp. 137--142).
[12]
Li, W., & McCallum, A. (2006). Pachinko allocation: DAG-structured mixture models of topic correlations. Proc. 2006 Int. Conf. Machine Learning (pp. 577--584).
[13]
Lovasz, L., & Plummer, M. (1986). Matching theory. North Holland, Budapest: Akadéémiai Kiadó.
[14]
Mei, Q., Cai, D., Zhang, D., & Zhai, C. (2008). Topic modeling with network regularization. WWW '08: Proceeding of the 17th international conference on World Wide Web (pp. 101--110).
[15]
Ng, A. Y., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems 14, 849--856. Cambridge, MA: MIT Press.
[16]
Paige, C. C., & Saunders, M. A. (1982). LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Transactions on Mathematical Software, 8, 43--71.
[17]
Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. Proceedings of the 20th conference on Uncertainty in artificial intelligence (pp. 487--494).
[18]
Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323--2326.
[19]
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888--905.
[20]
Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. Proc. 2003 Int. Conf. on Research and Development in Information Retrieval (SIGIR'03) (pp. 267--273). Toronto, Canada.
[21]
Zhu, X., & Lafferty, J. (2005). Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning. ICML '05: Proceedings of the 22nd international conference on Machine learning (pp. 1052--1059). Bonn, Germany.

Cited By

View all
  • (2024)Comparison of The Performances of Clustering and Dimensionality Reduction Approaches in Collaborative FilteringAdvances in Artificial Intelligence Research10.54569/aair.15979304:2(96-110)Online publication date: 30-Dec-2024
  • (2024)Dual-Graph Global and Local Concept Factorization for Data ClusteringIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.317743335:1(803-816)Online publication date: Jan-2024
  • (2024)Robust multilayer bootstrap networks in ensemble for unsupervised representation learning and clusteringPattern Recognition10.1016/j.patcog.2024.110739156(110739)Online publication date: Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374

Sponsors

  • NSF
  • Microsoft Research: Microsoft Research
  • MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICML '09
Sponsor:
  • Microsoft Research

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)3
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Comparison of The Performances of Clustering and Dimensionality Reduction Approaches in Collaborative FilteringAdvances in Artificial Intelligence Research10.54569/aair.15979304:2(96-110)Online publication date: 30-Dec-2024
  • (2024)Dual-Graph Global and Local Concept Factorization for Data ClusteringIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.317743335:1(803-816)Online publication date: Jan-2024
  • (2024)Robust multilayer bootstrap networks in ensemble for unsupervised representation learning and clusteringPattern Recognition10.1016/j.patcog.2024.110739156(110739)Online publication date: Dec-2024
  • (2024)Anchor-graph regularized orthogonal concept factorization for document clusteringNeurocomputing10.1016/j.neucom.2023.127173573(127173)Online publication date: Mar-2024
  • (2023)Swarm Intelligence Inspired Approach for Dynamic Tracking of Members’ Interests in Online Discussion GroupsInternational Journal of Affective Engineering10.5057/ijae.IJAE-D-22-0001222:3(209-220)Online publication date: 2023
  • (2023)Adaptive Kernel Graph Nonnegative Matrix FactorizationInformation10.3390/info1404020814:4(208)Online publication date: 29-Mar-2023
  • (2023)ECCA: Efficient Correntropy-Based Clustering Algorithm With Orthogonal Concept FactorizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.314280634:10(7377-7390)Online publication date: Oct-2023
  • (2023)Global Plus Local Jointly Regularized Support Vector Data Description for Novelty DetectionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.312932134:9(6602-6614)Online publication date: Sep-2023
  • (2023)ARGLRR: A Sparse Low-Rank Representation Single-Cell RNA-Sequencing Data Clustering Method Combined with a New Graph RegularizationJournal of Computational Biology10.1089/cmb.2023.007730:8(848-860)Online publication date: 1-Aug-2023
  • (2023)Deep NMF topic modelingNeurocomputing10.1016/j.neucom.2022.10.002515(157-173)Online publication date: Jan-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media