Large-scale supervised similarity learning in networks

Chang, Shiyu; Qi, Guo-Jun; Yang, Yingzhen; Aggarwal, Charu C.; Zhou, Jiayu; Wang, Meng; Huang, Thomas S.

doi:10.1007/s10115-015-0894-8

Large-scale supervised similarity learning in networks

Regular paper
Published: 20 October 2015

Volume 48, pages 707–740, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Shiyu Chang¹,
Guo-Jun Qi²,
Yingzhen Yang¹,
Charu C. Aggarwal³,
Jiayu Zhou⁴,
Meng Wang⁵ &
…
Thomas S. Huang¹

452 Accesses
2 Citations
Explore all metrics

Abstract

The problem of similarity learning is relevant to many data mining applications, such as recommender systems, classification, and retrieval. This problem is particularly challenging in the context of networks, which contain different aspects such as the topological structure, content, and user supervision. These different aspects need to be combined effectively, in order to create a holistic similarity function. In particular, while most similarity learning methods in networks such as SimRank utilize the topological structure, the user supervision and content are rarely considered. In this paper, a factorized similarity learning (FSL) is proposed to integrate the link, node content, and user supervision into a uniform framework. This is learned by using matrix factorization, and the final similarities are approximated by the span of low-rank matrices. The proposed framework is further extended to a noise-tolerant version by adopting a hinge loss alternatively. To facilitate efficient computation on large-scale data, a parallel extension is developed. Experiments are conducted on the DBLP and CoRA data sets. The results show that FSL is robust and efficient and outperforms the state of the art. The code for the learning algorithm used in our experiments is available at http://www.ifp.illinois.edu/~chang87/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A systematic review and research perspective on recommender systems

Article Open access 03 May 2022

A survey on bipartite graphs embedding

Article Open access 21 March 2023

Uncertainty-aware graph neural network for semi-supervised diversified recommendation

Article 17 April 2024

References

Aggarwal CC (2003) Towards systematic design of distance functions for data mining applications. In: Proceedings of the ninth ACM SIGKDD, ACM, pp 9–18
Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a mahalanobis metric from equivalence constraints. J Mach Learn Res 6:937–965
MathSciNet MATH Google Scholar
Birgin EG, Martínez JM, Raydan M (2000) Nonmonotone spectral projected gradient methods on convex sets. SIAM J Optim 10(4):1196–1211
Article MathSciNet MATH Google Scholar
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York
Book MATH Google Scholar
Ca JF, Candès EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982
Article MathSciNet MATH Google Scholar
Chang S, Qi G, Aggarwal C, Zhou J, Wang M, Huang T (2014) Factorized similarity learning in networks. In: ICDM, pp 60–69
Cheney W, Goldstein AA (1959) Proximity maps for convex sets. Proc Am Math Soc 10(3):448–450
Article MathSciNet MATH Google Scholar
Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information-theoretic metric learning. In: ICML, pp 209–216
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Deng H, Han J, Zhao B, Yu Y, Lin CX (2011) Probabilistic topic models with biased propagation on heterogeneous information networks. In: SIGKDD, pp 1271–1279
Geerts F, Mannila H, Terzi E (2004) Relational link-based ranking. In: VLDB, pp 552–563
Goldberger J, Roweis S, Hinton H, Salakhutdinov R (2004) Neighbourhood components analysis. In: NIPS, pp 513–520
Han SP (1988) A successive projection method. Math Progr 40(1–3):1–14
Article MathSciNet MATH Google Scholar
Hoi SCH, Liu W, Chang SF (2008) Semi-supervised distance metric learning for collaborative image retrieval. In: CVPR, IEEE computer society
Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: SIGKDD, pp 538–543
KorenY Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37
Article Google Scholar
Kotz S, Kozubowski T, Podgorski K (2001) The laplace distribution and generalizations: a revisit with applications to communications, economics, engineering, and finance. Progress in mathematics series. Birkhäuser, Boston
Kumar N, Kummamuru K, Paranjpe D (2005) Semi-supervised clustering with metric learning using relative comparisons. In: Fifth IEEE international conference on data mining, p 4
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Article Google Scholar
Li Z, Chang S, Liang F, Huang TS, Cao L, Smith JR (2013) Learning locally-adaptive decision functions for person verification. In: CVPR, 2013
Lin Z, King I, Lyu M (2006) Pagesim: a novel link-based similarity measure for the world wide web. In: IEEE/WIC/ACM international conference on web intelligence, 2006. WI 2006, pp 687–693
Liu X, Ji R, Yao H, Xu P, Sun X, Liu T (2008) Cross-media manifold learning for image retrieval and annotation. In: Lew MS, Bimbo AD, Bakker EM (eds) Multimedia information retrieval. ACM, New York, pp 141–148
Ma H, Yang H, Lyu MR, King I (2008) Sorec: social recommendation using probabilistic matrix factorization. In: CKIM, pp 931–940
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York
Book MATH Google Scholar
McCallum AK, Nigam K, Rennie J, Seymore K (2000) Automating the construction of internet portals with machine learning. Inf Retr 3(2):127–163
Article Google Scholar
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444
Article Google Scholar
Mnih A, Salakhutdinov R (2007) Probabilistic matrix factorization. In: NIPS, pp 1257–1264
Nesterov Y, Nesterov IE (2004) Introductory lectures on convex optimization: a basic course, vol 87. Springer, Berlin
MATH Google Scholar
Paatero P, Tapper U (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2):111–126
Article Google Scholar
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Technical report 1999-66, Stanford InfoLab, November 1999. Previous number = SIDL-WP-1999-0120
Purushotham S, Liu Y, Kuo CCJ (2012) Collaborative topic regression with social matrix factorization for recommendation systems. In: ICML, 2012
Qi GJ, Aggarwal C, Tian Q, Ji H, Huang T (2012) Exploring context and content links in social media: a latent space method. IEEE Trans Pattern Anal Mach Intell 34(5):850–862
Article Google Scholar
Qi GJ, Tang J, Zha ZJ, Chua TS, Zhang HJ (2009) An efficient sparse metric learning in high-dimensional space via l1-penalized log-determinant regularization. In: ICML, pp 841–848
Qian B, Wang X, Wang F, Li H, Ye J, Davidson I (2013) Active learning from relative queries. In: Proceedings of the twenty-third international joint conference on artificial intelligence. AAAI Press, pp 1614–1620
Qian B, Wang X, Wang J, Li H, Cao N, Zhi W, Davidson I (2013) Fast pairwise query selection for large-scale active learning to rank. In: IEEE 13th international conference on data mining (ICDM), 2013, pp 607–616
Shalev-Shwartz S, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for svm. In: ICML, pp 807–814
Tang J, Yan S, Hong R, Qi GJ, Chua TS (2009) Inferring semantic concepts from community-contributed images and noisy tags. In: SIGMM. ACM, pp 223–232
Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494
Article MathSciNet MATH Google Scholar
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Book MATH Google Scholar
Wang C, Blei DM (2011) Collaborative topic modeling for recommending scientific articles. In: SIGKDD, pp 448–456
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
MATH Google Scholar
Wen Z, Yin W, Zhang Y (2012) Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Math Progr Comput 4(4):333–361
Article MathSciNet MATH Google Scholar
Xi W, Fox EA, Fan W, Zhang B, Chen Z, Yan J, Zhuang D (2005) Simfusion: measuring similarity using unified relationship matrix. In: SIGIR, pp 130–137
Xing EP, Ng AY, Jordan MY, Russell S (2003) Distance metric learning, with application to clustering with side-information. In: NIPS, pp 505–512
Zeng C, Jiang Y, Zheng L, Li J, Li L, Li L, Shen C, Zhou W, Li T, Duan B, Lei M, Wang P (2013) Fiu-miner: a fast, integrated, and user-friendly system for data mining in distributed environment. In: SIGKDD, pp 1506–1509
Zhao P, Han J, Sun Y (2009) P-rank: a comprehensive structural similarity measure over information networks. In: CIKM, pp 553–562
Zhou J, Lu Z, Sun J, Yuan L, Wang F, Ye J (2013) Feafiner: biomarker identification from medical data through feature generalization and selection. In: SIGKDD, pp 1034–1042

Download references

Acknowledgments

The work of Shiyu Chang and Thomas S. Huang was funded in part by the National Science Foundation under Grant Number 1318971 and the Samsung Global Research Program 2013 under Theme “Big Data and Network,” Subject “Privacy and Trust Management In Big Data Analysis.” This work was partially sponsored by the Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-0053.

Author information

Authors and Affiliations

Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Shiyu Chang, Yingzhen Yang & Thomas S. Huang
University of Central Florida, Orlando, FL, 32816, USA
Guo-Jun Qi
IBM T.J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Charu C. Aggarwal
Michigan State University, East Lansing, MI, 48824, USA
Jiayu Zhou
Hefei University of Technology, Hefei, 230009, Anhui, China
Meng Wang

Authors

Shiyu Chang
View author publications
You can also search for this author in PubMed Google Scholar
Guo-Jun Qi
View author publications
You can also search for this author in PubMed Google Scholar
Yingzhen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Charu C. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Jiayu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Meng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Thomas S. Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shiyu Chang.

Additional information

This paper is an extended journal version of the ICDM 2014 best student paper [6] for the “Best of ICDM” special issue.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, S., Qi, GJ., Yang, Y. et al. Large-scale supervised similarity learning in networks. Knowl Inf Syst 48, 707–740 (2016). https://doi.org/10.1007/s10115-015-0894-8

Download citation

Received: 10 December 2014
Revised: 01 July 2015
Accepted: 10 October 2015
Published: 20 October 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10115-015-0894-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-scale supervised similarity learning in networks

Abstract

Access this article

Similar content being viewed by others

A systematic review and research perspective on recommender systems

A survey on bipartite graphs embedding

Uncertainty-aware graph neural network for semi-supervised diversified recommendation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Large-scale supervised similarity learning in networks

Abstract

Access this article

Similar content being viewed by others

A systematic review and research perspective on recommender systems

A survey on bipartite graphs embedding

Uncertainty-aware graph neural network for semi-supervised diversified recommendation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation