Parallel collective factorization for modeling large heterogeneous networks

Rossi, Ryan A.; Zhou, Rong

doi:10.1007/s13278-016-0349-6

Parallel collective factorization for modeling large heterogeneous networks

Original Article
Published: 01 September 2016

Volume 6, article number 67, (2016)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Ryan A. Rossi¹ &
Rong Zhou¹

404 Accesses
5 Citations
Explore all metrics

Abstract

Relational learning methods for heterogeneous network data are becoming increasingly important for many real-world applications. However, existing relational learning approaches are sequential, inefficient, unable to scale to large heterogeneous networks, as well as many other limitations related to convergence, parameter tuning, etc. In this paper, we propose Parallel Collective Matrix Factorization (PCMF) that serves as a fast and flexible framework for joint modeling of a variety of heterogeneous network data. The PCMF learning algorithm solves for a single parameter given the others, leading to a parallel scheme that is fast, flexible, and general for a variety of relational learning tasks and heterogeneous data types. The proposed approach is carefully designed to be (1) efficient for large heterogeneous networks (linear in the total number of observations from the set of input matrices), (2) flexible as many components are interchangeable and easily adaptable, and (3) effective for a variety of applications as well as for different types of data. The experiments demonstrate the scalability, flexibility, and effectiveness of PCMF for a variety of relational modeling tasks. In particular, PCMF outperforms a recent state-of-the-art approach in runtime, scalability, and prediction quality. Finally, we also investigate variants of PCMF for serving predictions in a real-time streaming fashion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Note that undirected homogeneous networks (symmetric matrices) are a special case of our framework.
A worker is a thread in shared memory setting and machine in distributed memory architecture.
https://www.threadingbuildingblocks.org/.
The likelihood expression assumes noise in the data is Gaussian.
Edges were also sampled inversely proportional to the degree of each neighborhood node.
http://networkrepository.com.
A recently proposed parallel coordinate descent method for recommendation.
Speed may be fundamentally more important than accuracy.
Undirected social networks give rise to variants based on in/out/total degree.
Note that these are known actual relationships in the social network, but are not used for learning.

References

Aggarwal CC (2007) Data streams: models and algorithms, vol 31. Springer, Berlin
Book MATH Google Scholar
Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on Very large data bases, vol 29. VLDB Endowment, pp 81–92
Ahmed NK, Neville J, Kompella R (2013) Network sampling: from static to streaming graphs. TKDD, pp 1–54
Ahmed NK, Rossi RA (2015) Interactive visual graph analytics on the web. In: International AAAI conference on web and social media (ICWSM), pp 566–569
Akaike H (1974) A new look at the statistical model identification. Trans Autom Control 19(6):716–723
Article MathSciNet MATH Google Scholar
Bilgic M, Mihalkova L, Getoor L (2010) Active learning for networked data. In: ICML, pp 79–86
Bonhard P, Sasse M (2006) Knowing me, knowing you using profiles and social networking to improve recommender systems. BT Technol J 24(3):84–98
Article Google Scholar
Borgatti SP, Everett MG, Johnson JC (2013) Analyzing social networks. SAGE Publications Limited, California
Google Scholar
Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Program 91(2):201–213
Article MathSciNet MATH Google Scholar
Fairbanks J, Ediger D, McColl R, Bader DA, Gilbert E (2013) A statistical framework for streaming graph analysis. In: ASONAM, pp 341–347
Gemulla R, Nijkamp E, Haas PJ, Sismanis Y (2011) Large-scale matrix factorization with distributed stochastic gradient descent. In: SIGKDD, pp 69–77
Jamali M, Ester M (2010) A matrix factorization technique with trust propagation for recommendation in social networks. In: RecSys, pp 135–142
Jiang D, Pei J, Li H (2013) Mining search and browse logs for web search: a survey. TIST 4(4):57
Article Google Scholar
Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37
Article Google Scholar
La Fond T, Neville J (2010) Randomization tests for distinguishing social influence and homophily effects. In: WWW, pp 601–610
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. JASIST 58(7):1019–1031
Article Google Scholar
Liu W, He J, Chang S-F (2010) Large graph construction for scalable semi-supervised learning. In: Proceedings of the 27th international conference on machine learning, pp 679–686
Lusk EL, Pieper SC, Butler RM et al (2010) More scalability, less pain: a simple programming model and its implementation for extreme computing. SciDAC Rev 17(1):30–37
Google Scholar
Ma H, Yang H, Lyu MR, King I (2008) Sorec: social recommendation using probabilistic matrix factorization. In: CIKM, pp 931–940
Massa P, Avesani P (2007) Trust-aware recommender systems. In: Proceedings of the 2007 ACM conference on recommender systems. ACM, pp 17–24
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27:415–444
Article Google Scholar
Mislove A, Marcon M, Gummadi K, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: SIGCOMM, pp 29–42
Niu F, Recht B, Ré C, Wright SJ (2011) Hogwild!: a lock-free approach to parallelizing stochastic gradient descent. NIPS 24:693–701
Google Scholar
Recht B, Ré C (2013) Parallel stochastic gradient algorithms for large-scale matrix completion. Math Program Comput 5(2):201–226
Article MathSciNet MATH Google Scholar
Rossi RA, Ahmed NK (2014) Role discovery in networks. TKDE 26(7):1–20
Google Scholar
Rossi RA, Ahmed NK (2016) An interactive data repository with visual analytics. SIGKDD Explor 17(2):37–41
Article Google Scholar
Rossi RA, McDowell LK, Aha DW, Neville J (2012) Transforming graph data for statistical relational learning. JAIR 45(1):363–441
MATH Google Scholar
Salakhutdinov R, Mnih A (2007) Probabilistic matrix factorization. In NIPS, vol 1, pp 1–2
Satuluri V, Parthasarathy S, Ruan Y (2011) Local graph sparsification for scalable clustering. In: Proceedings of the 2011 international conference on Management of data. ACM, pp 721–732
Singla P, Richardson M (2008) Yes, there is a correlation: from social networks to personal behavior on the web. In: WWW, pp 655–664
Spielman DA, Teng S-H (2004) Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In: Proceedings of the thirty-sixth annual ACM symposium on theory of computing. ACM, pp 81–90
Sun Y, Han J (2012) Mining heterogeneous information networks: principles and methodologies. Synth Lect Data Min Knowl Discov 3(2):1–159
Article Google Scholar
Tang J, Hu X, Liu H (2013) Social recommendation: a review. SNAM 3(4):1113–1133
Google Scholar
Tsai M-H, Aggarwal C, Huang T (2014) Ranking in heterogeneous social media. In: WSDM, pp 613–622
Vorontsov M, Carhart G, Ricklin J (1997) Adaptive phase-distortion correction based on parallel gradient-descent optimization. Opt Lett 22(12):907–909
Article Google Scholar
Yang X, Guo Y, Liu Y, Steck H (2013) A survey of collaborative filtering based social recommender systems. Comput Commun 41:1–10
Article Google Scholar
Yang S-H, Long B, Smola A, Sadagopan N, Zheng Z, and Zha H (2011) Like like alike: joint friendship and interest propagation in social networks. In: WWW, pp 537–546
Yasui Y, Fujisawa K, Goto K (2013) NUMA-optimized parallel breadth-first search on multicore single-node system. In: Big data, pp 394–402
Yu H-F, Hsieh C-J, Si S, Dhillon IS (2012) Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In: ICDM, pp 765–774
Zhou Y, Wilkinson D, Schreiber R, Pan R (2008) Large-scale parallel collaborative filtering for the netflix prize. In: Algorithmic aspects in information and management. Springer, pp 337–348
Zinkevich M, Weimer M, Smola AJ, Li L (2010) Parallelized stochastic gradient descent. In: NIPS, vol 4, p 4

Download references

Author information

Authors and Affiliations

Palo Alto Research Center (PARC, a Xerox Company), 3333 Coyote Hill Rd, Palo Alto, CA, 94304, USA
Ryan A. Rossi & Rong Zhou

Authors

Ryan A. Rossi
View author publications
You can also search for this author in PubMed Google Scholar
Rong Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ryan A. Rossi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rossi, R.A., Zhou, R. Parallel collective factorization for modeling large heterogeneous networks. Soc. Netw. Anal. Min. 6, 67 (2016). https://doi.org/10.1007/s13278-016-0349-6

Download citation

Received: 13 June 2015
Revised: 15 September 2015
Accepted: 14 June 2016
Published: 01 September 2016
DOI: https://doi.org/10.1007/s13278-016-0349-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel collective factorization for modeling large heterogeneous networks

Abstract

Access this article

Similar content being viewed by others

Emerging trends in federated learning: from model fusion to federated X learning

A survey of Bayesian Network structure learning

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallel collective factorization for modeling large heterogeneous networks

Abstract

Access this article

Similar content being viewed by others

Emerging trends in federated learning: from model fusion to federated X learning

A survey of Bayesian Network structure learning

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation