Accelerating Parallel ALS for Collaborative Filtering on Hadoop

Liang, Yi; Zeng, Shaokang; Liang, Yande; Chen, Kaizhong

doi:10.1007/978-3-030-49556-5_13

Yi Liang¹³,
Shaokang Zeng¹³,
Yande Liang¹³ &
…
Kaizhong Chen¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12093))

Included in the following conference series:

International Symposium on Benchmarking, Measuring and Optimization

1067 Accesses

Abstract

Collaborative Filtering (CF) is an important building block of recommendation systems. Alternating Least Squares (ALS) is the most popular algorithm used in CF models to calculate the latent factor matrix factorization. Parallel ALS on Hadoop is widely used in the era of big data. However, existing work on the computational efficiency of parallel ALS on Hadoop have two defects. One is the imbalance of data distribution, the other is lacking the fine-grained parallel processing on the rating data. Aiming on these issues, we propose an integrated optimized solution. The solution first optimizes the rating data partition with the consideration of both the number of involved data records and the partitioned data size. Then, the multithread-based fine-grained parallelism is introduced to process rating data records within a map task concurrently. Experimental results demonstrate that our solution can reduce the overall runtime of Hadoop ALS by 82.17% by maximum.

Supported by 2019 BenchCouncil AI System and Algorithm Challenge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bokde, D., Girase, S., Mukhopadhyay, D.: Matrix factorization model in collaborative filtering algorithms: a survey. J. Procedia Comput. Sci. 49(1), 136–146 (2015)
Article Google Scholar
Hernando, A., Bobadilla, J., Ortega, F.: A non negative matrix factorization for collaborative filtering recommender systems based on a Bayesian probabilistic model. Knowl.-Based Syst. 97(4), 188–202 (2016)
Article Google Scholar
Deshpande, M., Karypis, G.: Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst. (TOIS) 22(1), 143–177 (2004)
Article Google Scholar
Hanmin, Y., Zhang, Q., Bai, X.: A new collaborative filtering algorithm based on modified matrix factorization. In: Electronic and Automation Control Conference (IAEAC), pp. 147–151. IEEE (2017)
Google Scholar
Yang, Z., Chen, W., Huang, J.: Enhancing recommendation on extremely sparse data with blocks-coupled non-negative matrix factorization. J. Neurocomput. 278, 126–133 (2018)
Article Google Scholar
Herodotou, H., Dong, F., Babu, S.: Mapreduce programming and cost-based optimization crossing this chasm with starfish. J. Proc. VLDB Endowment 4(12), 1446–1449 (2011)
Article Google Scholar
Herodotou, H.: Hadoop performance models. J. arXiv preprint arXiv, 1106.0940(2011)
Google Scholar
Manda, W., Michael, B., Anthony, L., Hans, D.: Algorithmic acceleration of parallel ALS for collaborative filtering: speeding up distributed big data recommendation in Spark. In: 21st International Conference on Parallel and Distributed Systems(ICPADS), pp. 682–691. IEEE (2015)
Google Scholar
Krzysztof, F., Rafal, Z.: Distributed nonnegative matrix factorization with HALS algorithm on Apache Spark. In: Artificial Intelligence and Soft Computing - 17th International Conference (ICAISC), pp. 333–342 (2018)
Google Scholar
Bing, T., Linyao, K., Xia, Y., Zhang, L.: GPU-accelerated large-scale non-negative matrix factorization using spark. In: Collaborative Computing: Networking, Applications and Worksharing- 14th International Conference (EAI), pp. 189–201 (2018)
Google Scholar
Maria, M., Katayoun, N., Setareh, R., Houman, H.: Hadoop workloads characterization for performance and energy efficiency optimizations on microservers. J. IEEE Trans. Multi-Scale Comput. Syst. 4(3), 355–368 (2018)
Article Google Scholar
Jyotindra, T., Mahesh, P., Anjana, P.: A Hadoop based collaborative filtering recommender system accelerated on GPU using OpenCL. J. Int. J. Eng. Sci. Res. Technol. 6(9), 195–209 (2017)
Google Scholar
Teflioudi, C., Makari, F., Gemulla, R.: Distributed matrix completion. In: 12th International Conference on Data Mining (ICDM), pp. 655–664. IEEE(2012)
Google Scholar
Yu, H.-F., Hsieh, C.-J.,Dhillon, I., et al.: Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In: 12th International Conference on Data Mining (ICDM), pp. 765–774. IEEE(2012)
Google Scholar
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pp. 15–28 (2012)
Google Scholar
Zhou, Y., Wilkinson, D., Schreiber, R., Pan, R.: Large-scale parallel collaborative filtering for the Netflix prize. In: Proceedings of the 4th International Conference on Algorithmic Aspects in Information and Management, pp. 337–348 (2008)
Google Scholar
Wanling, G., Fei, T., Wang, L., Zhan, J., Lan, C., et. al.: AIBench: an industry standard internet service AI benchmark suite. J. arXiv preprint arXiv:1908.08998 (2019)
Gao, W., et al.: AIBench: towards scalable and comprehensive datacenter AI benchmarking. In: Zheng, C., Zhan, J. (eds.) Bench 2018. LNCS, vol. 11459, pp. 3–9. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32813-9_1
Chapter Google Scholar
Jiang, Z., et al.: HPC AI500: a benchmark suite for HPC AI systems. In: Zheng, C., Zhan, J. (eds.) Bench 2018. LNCS, vol. 11459, pp. 10–22. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32813-9_2
Chapter Google Scholar
Hao, T., Huang, Y., Wen, X., Gao, W., Zhang, F., Zheng, C., Wang, L., Ye, H., Hwang, K., Ren, Z., Zhan, J.: Edge AIBench: towards comprehensive end-to-end edge computing benchmarking. In: Zheng, C., Zhan, J. (eds.) Bench 2018. LNCS, vol. 11459, pp. 23–30. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32813-9_3
Chapter Google Scholar
Luo, C., et al.: AIoT bench: towards comprehensive benchmarking mobile and embedded device intelligence. In: Zheng, C., Zhan, J. (eds.) Bench 2018. LNCS, vol. 11459, pp. 31–35. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32813-9_4
Chapter Google Scholar
Comon, P., Luciani, X., de Almeida, A.L.F.: Tensor decompositions, alternating least squares and other tales. J. Chemom. 23, 393–405 (2009)
Article Google Scholar
Liu, L.: Computing infrastructure for big data processing. Front. Comput. Sci. 7, 165–170 (2013)
Article MathSciNet Google Scholar
Li, G., Wang, X., Ma, X., Liu, L., Feng, X.: XDN: Towards efficient inference of residual neural networks on cambricon chips. In: Gao, W., et al. (eds.) Bench 2019, LNCS, vol. 12093, pp. 51–56. Springer, Cham (2019)
Google Scholar
Li, J., Jiang, Z.: Performance analysis of cambricon mlu100. In: Gao, W., et al. (eds.) Bench 2019, LNCS, vol. 12093, pp. 57–66. Springer, Cham (2019)
Google Scholar
Hou, P., Yu, J., Miao, Y., Tai, Y., Wu, Y., Zhao, C.: RVTensor: A light-weight neural network inference framework based on the RISC-V architecture. In: Gao, W., et al. (eds.) Bench 2019, LNCS, vol. 12093, pp. 85–90. Springer, Cham (2019)
Google Scholar
Deng, W., Wang, P., Wang, J., Li, C., Guo, M.: PSL: exploiting parallelism, sparsity and locality to accelerate matrix factorization on x86 platforms. In: Gao, W., et al. (eds.) Bench 2019, LNCS, vol. 12093, pp. 101–109. Springer, Cham (2019)
Google Scholar
Hao, T., Zheng, Z.: The implementation and optimization of matrix decomposition based collaborative filtering task on x86 platform. In: Gao, W., et al. (eds.) Bench 2019, LNCS, vol. 12093, pp. 110–115. Springer, Cham (2019)
Google Scholar
Xiong, X., Wen, X., Huang, C.: Improving RGB-D face recognition via transfer learning from a pretrained 2D network. In: Gao, W., et al. (eds.) Bench 2019, LNCS, vol. 12093, pp. 141–148. Springer, Cham (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
Yi Liang, Shaokang Zeng, Yande Liang & Kaizhong Chen

Authors

Yi Liang
View author publications
You can also search for this author in PubMed Google Scholar
Shaokang Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Yande Liang
View author publications
You can also search for this author in PubMed Google Scholar
Kaizhong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Liang .

Editor information

Editors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Wanling Gao
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Jianfeng Zhan
School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
Geoffrey Fox
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
Xiaoyi Lu
Texas Advanced Computing Center, The University of Texas at Austin, Austin, TX, USA
Dan Stanzione

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, Y., Zeng, S., Liang, Y., Chen, K. (2020). Accelerating Parallel ALS for Collaborative Filtering on Hadoop. In: Gao, W., Zhan, J., Fox, G., Lu, X., Stanzione, D. (eds) Benchmarking, Measuring, and Optimizing. Bench 2019. Lecture Notes in Computer Science(), vol 12093. Springer, Cham. https://doi.org/10.1007/978-3-030-49556-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-49556-5_13
Published: 09 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49555-8
Online ISBN: 978-3-030-49556-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Accelerating Parallel ALS for Collaborative Filtering on Hadoop