research-article

Probabilistic dyadic data analysis with local and global consistency

Authors:

Xiaofei HeAuthors Info & Claims

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 105 - 112

https://doi.org/10.1145/1553374.1553388

Published: 14 June 2009 Publication History

Abstract

Dyadic data arises in many real world applications such as social network analysis and information retrieval. In order to discover the underlying or hidden structure in the dyadic data, many topic modeling techniques were proposed. The typical algorithms include Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). The probability density functions obtained by both of these two algorithms are supported on the Euclidean space. However, many previous studies have shown naturally occurring data may reside on or close to an underlying submanifold. We introduce a probabilistic framework for modeling both the topical and geometrical structure of the dyadic data that explicitly takes into account the local manifold structure. Specifically, the local manifold structure is modeled by a graph. The graph Laplacian, analogous to the Laplace-Beltrami operator on manifolds, is applied to smooth the probability density functions. As a result, the obtained probabilistic distributions are concentrated around the data manifold. Experimental results on real data sets demonstrate the effectiveness of the proposed approach.

References

[1]

Belkin, M., & Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems 14, 585--591. Cambridge, MA: MIT Press.

[2]

Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from examples. Journal of Machine Learning Research, 7, 2399--2434.

Digital Library

[3]

Blei, D., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation. Journal of machine Learning Research, 993--1022.

Digital Library

[4]

Cai, D., Mei, Q., Han, J., & Zhai, C. (2008). Modeling hidden topics on document manifold. CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge management (pp. 911--920).

Digital Library

[5]

Chung, F. R. K. (1997). Spectral graph theory, vol. 92 of Regional Conference Series in Mathematics. AMS.

[6]

Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41, 391--407.

[7]

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1--38.

[8]

Hofmann, T. (1999). Probabilistic latent semantic indexing. Proc. 1999 Int. Conf. on Research and Development in Information Retrieval (pp. 50--57). Berkeley, CA.

Digital Library

[9]

Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42, 177--196.

Digital Library

[10]

Hofmann, T., Puzicha, J., & Jordan, M. I. (1998). Learning from dyadic data. In Advances in neural information processing systems 11, 466--472.

Digital Library

[11]

Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. European conference on machine learning (pp. 137--142).

Digital Library

[12]

Li, W., & McCallum, A. (2006). Pachinko allocation: DAG-structured mixture models of topic correlations. Proc. 2006 Int. Conf. Machine Learning (pp. 577--584).

Digital Library

[13]

Lovasz, L., & Plummer, M. (1986). Matching theory. North Holland, Budapest: Akadéémiai Kiadó.

Digital Library

[14]

Mei, Q., Cai, D., Zhang, D., & Zhai, C. (2008). Topic modeling with network regularization. WWW '08: Proceeding of the 17th international conference on World Wide Web (pp. 101--110).

Digital Library

[15]

Ng, A. Y., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems 14, 849--856. Cambridge, MA: MIT Press.

[16]

Paige, C. C., & Saunders, M. A. (1982). LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Transactions on Mathematical Software, 8, 43--71.

Digital Library

[17]

Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. Proceedings of the 20th conference on Uncertainty in artificial intelligence (pp. 487--494).

Digital Library

[18]

Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323--2326.

[19]

Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888--905.

Digital Library

[20]

Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. Proc. 2003 Int. Conf. on Research and Development in Information Retrieval (SIGIR'03) (pp. 267--273). Toronto, Canada.

Digital Library

[21]

Zhu, X., & Lafferty, J. (2005). Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning. ICML '05: Proceedings of the 22nd international conference on Machine learning (pp. 1052--1059). Bonn, Germany.

Digital Library

Cited By

Taş Ö(2024)Comparison of The Performances of Clustering and Dimensionality Reduction Approaches in Collaborative FilteringAdvances in Artificial Intelligence Research10.54569/aair.15979304:2(96-110)Online publication date: 30-Dec-2024
https://doi.org/10.54569/aair.1597930
Li NLeng CCheng IBasu AJiao L(2024)Dual-Graph Global and Local Concept Factorization for Data ClusteringIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.317743335:1(803-816)Online publication date: Jan-2024
https://doi.org/10.1109/TNNLS.2022.3177433
Zhang XLi X(2024)Robust multilayer bootstrap networks in ensemble for unsupervised representation learning and clusteringPattern Recognition10.1016/j.patcog.2024.110739156(110739)Online publication date: Dec-2024
https://doi.org/10.1016/j.patcog.2024.110739
Show More Cited By

Index Terms

Probabilistic dyadic data analysis with local and global consistency

Recommendations

Joint Global and Local Structure Discriminant Analysis

Linear discriminant analysis (LDA) only considers the global Euclidean geometrical structure of data for dimensionality reduction. However, previous works have demonstrated that the local geometrical structure is effective for dimensionality reduction. ...
Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis

Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in high-dimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it ...
Local Tangent Space Discriminant Analysis

We propose a novel supervised dimensionality reduction method named local tangent space discriminant analysis (TSD) which is capable of utilizing the geometrical information from tangent spaces. The proposed method aims to seek an embedding space where ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

June 2009

1331 pages

ISBN:9781605585161

DOI:10.1145/1553374

General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University

Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

Sponsors

NSF
Microsoft Research: Microsoft Research
MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICML '09

Sponsor:

Microsoft Research

ICML '09: The 26th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

June 14 - 18, 2009

Quebec, Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

125
Total Citations
View Citations
667
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)3

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Taş Ö(2024)Comparison of The Performances of Clustering and Dimensionality Reduction Approaches in Collaborative FilteringAdvances in Artificial Intelligence Research10.54569/aair.15979304:2(96-110)Online publication date: 30-Dec-2024
https://doi.org/10.54569/aair.1597930
Li NLeng CCheng IBasu AJiao L(2024)Dual-Graph Global and Local Concept Factorization for Data ClusteringIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.317743335:1(803-816)Online publication date: Jan-2024
https://doi.org/10.1109/TNNLS.2022.3177433
Zhang XLi X(2024)Robust multilayer bootstrap networks in ensemble for unsupervised representation learning and clusteringPattern Recognition10.1016/j.patcog.2024.110739156(110739)Online publication date: Dec-2024
https://doi.org/10.1016/j.patcog.2024.110739
Yang BXue ZWu JZhang XNie FChen B(2024)Anchor-graph regularized orthogonal concept factorization for document clusteringNeurocomputing10.1016/j.neucom.2023.127173573(127173)Online publication date: Mar-2024
https://doi.org/10.1016/j.neucom.2023.127173
AYEDOUN EGOTO MTOKUMARU M(2023)Swarm Intelligence Inspired Approach for Dynamic Tracking of Members’ Interests in Online Discussion GroupsInternational Journal of Affective Engineering10.5057/ijae.IJAE-D-22-0001222:3(209-220)Online publication date: 2023
https://doi.org/10.5057/ijae.IJAE-D-22-00012
Li RGuo YZhang B(2023)Adaptive Kernel Graph Nonnegative Matrix FactorizationInformation10.3390/info1404020814:4(208)Online publication date: 29-Mar-2023
https://doi.org/10.3390/info14040208
Yang BZhang XNie FChen BWang FNan ZZheng N(2023)ECCA: Efficient Correntropy-Based Clustering Algorithm With Orthogonal Concept FactorizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.314280634:10(7377-7390)Online publication date: Oct-2023
https://doi.org/10.1109/TNNLS.2022.3142806
Hu WHu TWei YLou JWang S(2023)Global Plus Local Jointly Regularized Support Vector Data Description for Novelty DetectionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.312932134:9(6602-6614)Online publication date: Sep-2023
https://doi.org/10.1109/TNNLS.2021.3129321
Wang ZLiu JShang JDai LZheng CWang J(2023)ARGLRR: A Sparse Low-Rank Representation Single-Cell RNA-Sequencing Data Clustering Method Combined with a New Graph RegularizationJournal of Computational Biology10.1089/cmb.2023.007730:8(848-860)Online publication date: 1-Aug-2023
https://doi.org/10.1089/cmb.2023.0077
Wang JZhang X(2023)Deep NMF topic modelingNeurocomputing10.1016/j.neucom.2022.10.002515(157-173)Online publication date: Jan-2023
https://doi.org/10.1016/j.neucom.2022.10.002
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten