Skip to main content
Log in

Matrix factorization of large scale data using multistage matrix factorization

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Matrix Factorization (MF) is a resource intensive task that consumes significant memory and computational effort and is not scalable with the quantum of data. When the size of the input matrix and the latent feature matrices are higher than the available memory, both on a Central Processing Unit (CPU) as well as a Graphical Processing Unit (GPU), loading all the required matrices on to CPU/GPU memory may not be possible. Such scenarios call for alternative techniques that not only allow parallelism but also address memory limitations and plays a crucial role in industrial applications. In this paper we propose a divide and conquer technique based on a two stage factorization process. In the first step, we divide the data set into different groups and factorize each group. In the second step, we use factorization based learning model to combine the latent features derived in the first step. Our motivation is to develop a method that can achieve both parallelism and scalability as well as address factorization of incrementally growing data. Our contribution is a novel multi-stage matrix factorization (MsMF) approach. The experimental results demonstrate improvements in RMSE as well as computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://grouplens.org/datasets/

References

  1. Koren Y (2009) 1 the bellkor solution to the netflix grand prize

  2. Kysenko V, Rupp K, Marchenko O, Selberherr S, Anisimov A (2012) Gpu-accelerated non-negative matrix factorization for text mining. In: Bouma A, Ittoo G, Mtais E, Wortmann H (eds) Natural Language Processing and Information Systems. Springer, Berlin, pp 158–163

  3. Ross DA, Lim J, Lin R-S, Yang M-H (2008) Incremental learning for robust visual tracking. Int J Comput Vis 77(1):125–141

    Article  Google Scholar 

  4. Li Y, Sima DM, Cauter SV, Croitor Sava AR, Himmelreich U, Pi Y, Van Huffel S Hierarchical non-negative matrix factorization (hnmf): a tissue pattern differentiation method for glioblastoma multiforme diagnosis using mrsi. NMR Biomed 26(3):307–319

  5. Weinberger KQ, Packer B, Saul LK (2005) Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In: AISTATS, vol 2, pp 6

  6. Mackey LW, Jordan MI, Talwalkar A (2011) Divide-and-conquer matrix factorization. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ (eds) Advances in neural information processing systems, vol 24, Curran Associates, Inc., pp 1134–1142

  7. Sadowski T, Zdunek R (2018) Image Completion with Smooth Nonnegative Matrix Factorization, pp 62–72

  8. Hosseini-Asl E, Zurada JM (2014) Nonnegative matrix factorization for document clustering: A survey. In: Rutkowski M, Korytkowski L, Scherer R, Tadeusiewicz R, Zadeh LA, Zurada JM (eds) Artificial Intelligence and Soft Computing. Springer. International Publishing

  9. Fu X, Huang K, Sidiropoulos ND, Ma W (2019) Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications. IEEE Signal Proc Mag 36(2):59–80

    Article  Google Scholar 

  10. Proximal maximum margin matrix factorization for collaborative filtering. Pattern Recogn Lett 86(2017):62–67

  11. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

    Article  Google Scholar 

  12. Wang Q, Cao Z, Xu J, Li H (2012) Group matrix factorization for scalable topic modeling. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12

  13. Du R, Kuang D, Drake B, Park H (2017) Dc-nmf: nonnegative matrix factorization based on divide-and-conquer for fast clustering and topic modeling. J Glob Optim 68(4):777–798

    Article  MathSciNet  Google Scholar 

  14. Yun H, Yu H-F, Hsieh C-J, Vishwanathan SVN, Dhillon I Nomad: Non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion. Proceedings of VLDB Endow

  15. Li B, Tata S, Sismanis Y (2013) Sparkler: supporting large-scale matrix factorization. In: Joint 2013 EDBT/ICDT Conferences, EDBT ’13 Proceedings, Genoa, Italy, pp 625–636

  16. Zhuang Y, Chin Wx-S, Juan Y-C, Lin C-J (2013) A fast parallel sgd for matrix factorization in shared memory systems. In: Proceedings of the 7th ACM Conference on Recommender Systems, RecSys ’13

  17. Recht B, Rė C, Wright SJ, Niu F (2011) Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: Advances in Neural Information Processing Systems 24: NIPS 2011. Proceedings of a meeting held, Granada, pp 693–701

  18. Oh J, Han W, Yu H, Jiang X (2015) Fast and robust parallel SGD matrix factorization. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, pp 865–874

  19. Yu H, Hsieh C, Si S, Dhillon IS (2012) Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In: 12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, pp 765–774

  20. Chin W, Zhuang Y, Juan Y, Lin C (2015) A learning-rate schedule for stochastic gradient methods to matrix factorization. In: Advances in knowledge discovery and data mining - 19th pacific-asia conference, PAKDD 2015, Ho Chi Minh City, Proceedings, Part I, pp 442–455

  21. Schelter S, Satuluri V, Zadeh R Factorbird - a parameter server approach to distributed matrix factorization, arXiv:1411.0602

  22. Zhu B, Li W, Li R, Xue X (2013) Multi-stage non-negative matrix factorization for monaural singing voice separation. IEEE Trans Audio Speech Lang Process 21(10):2096–2107

    Article  Google Scholar 

  23. Park S, Kim Y-D, Choi S (2013) Hierarchical bayesian matrix factorization with side information. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI ’13, pp 1593–1599

  24. Li H, Liu Y, Qian Y, Mamoulis N, Tu W, Cheung DW Hhmf: hidden hierarchical matrix factorization for recommender systems, Data Mining and Knowledge Discovery

  25. Shan H, Kattge J, Reich P, Banerjee A, Schrodt F, Reichstein M (2012) Gap filling in the plant kingdom—trait prediction using hierarchical probabilistic matrix factorization

  26. Gillis N, Glineur F (2012) Accelerated multiplicative updates and hierarchical als algorithms for nonnegative matrix factorization. Neural Comput 24(4):1085–1105

    Article  MathSciNet  Google Scholar 

  27. Cichocki A, Zdunek R, Amari S-I (2007) Hierarchical als algorithms for nonnegative matrix and 3d tensor factorization. In: Davies ME, James CJ, Abdallah SA, Plumbley MD (eds) Independent Component Analysis and Signal Separation. Springer, Berlin, pp 169–176

  28. Basbug ME, Engelhardt BE (2016) Hierarchical compound poisson factorization. In: ICML

  29. Gopalan P, Hofman JM, Blei DM (2013) Scalable recommendation with poisson factorization

  30. Anyosa SC, Vinagre JA, Jorge AM (2018) Incremental matrix co-factorization for recommender systems with implicit feedback WWW ’18

  31. Koitka S, Friedrich CM (2016) Nmfgpu4r: Gpu-accelerated computation of the non-negative matrix factorization (nmf) using cuda capable hardware. R J 8(2):382–392

    Article  Google Scholar 

  32. Tan W, Chang S, Fong LL, Li C, Wang Z, Cao L (2018) Matrix factorization on gpus with memory optimization and approximate computing. In: ICPP

  33. Zhong E, Fan W, Yang Q (2012) Contextual collaborative filtering via hierarchical matrix factorization. In: Proceedings of the 2012 SIAM International Conference on Data Mining. SIAM, pp 744–755

  34. Sawant S (2013) Collaborative filtering using weighted bipartite graph projection: a recommendation system for yelp. In: Proceedings of the CS224W: Social and information network analysis conference, vol 33

  35. Genre-based link prediction in bipartite graph for music recommendation. Procedia Comput Sci 91(2016):959–965, promoting Business Analytics and Quantitative Management of Technology: 4th International Conference on Information Technology and Quantitative Management (ITQM 2016)

  36. Ban Y On finding dense subgraphs in bipartite graphs: Linear algorithms, arXiv:1810.06809

  37. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37

    Article  Google Scholar 

  38. Salakhutdinov R, Mnih A In: Advances in Neural Information Processing Systems

  39. Bhavana P, Kumar V, Padmanabhan V (2019) Block based singular value decomposition approach to matrix factorization for recommender systems

  40. Sarwar B, Karypis G, Konstan J, Riedl J (2002) Incremental singular value decomposition algorithms for highly scalable recommender systems. In: Fifth International Conference on Computer and Information Science, pp 27–28

  41. Tzeng J (2013) Split-and-combine singular value decomposition for large-scale matrix. Journal of Applied Mathematics

  42. Huang X, Wu L, Chen E, Zhu H, Liu Q, Wang Y (2017) Incremental matrix factorization: a linear feature transformation perspective. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI’17

  43. Incremental collaborative filtering recommender based on regularized matrix factorization. Knowl-Based Syst 27(2012):271– 280

  44. Yu T, Mengshoel OJ, Jude A, Feller E, Forgeat J, Radia N (2016) Incremental learning for matrix factorization in recommender systems. In: 2016 IEEE International Conference on Big Data (Big Data), pp 1056–1063

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prasad Bhavana.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhavana, P., Padmanabhan, V. Matrix factorization of large scale data using multistage matrix factorization. Appl Intell 51, 4016–4028 (2021). https://doi.org/10.1007/s10489-020-01957-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01957-0

Keywords

Navigation