Implementation of an Alternating Least Square Model Based Collaborative Filtering Movie Recommendation System on Hadoop and Spark Platforms

Li, Jung-Bin; Lin, Szu-Yin; Hsu, Yu-Hsiang; Huang, Ying-Chu

doi:10.1007/978-3-030-02613-4_21

Jung-Bin Li⁶,
Szu-Yin Lin⁷,
Yu-Hsiang Hsu⁶ &
…
Ying-Chu Huang⁶

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 25))

Included in the following conference series:

International Conference on Broadband and Wireless Computing, Communication and Applications

1296 Accesses

Abstract

Nowadays, consumers and businesses all face the problem of information explosion. Recommendation systems represent a powerful solution This study practices a movie recommendation system to give suggestions of films to the movie-watcher, enabling him to consume more while shortening the time interval between payments.

This research implements a prototype recommendation system based on collaborative filtering with Alternating Least Squares (ALS) algorithm. Collaborative filtering has the advantage of avoiding possible violation of the Personal Information Protection Act and reducing the possibility the errors caused by poor quality of personal information. However, one of its shortcomings is the scalability. Our study attempts to improve it by adopting Spark with Hadoop Yarn platform and uses it to compute movie recommendation and to store data respectively. The result of this research shows that the proposed system offers recommendations with satisfying accuracy while keeping acceptable computation time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Improved ALS Recommendation Model Based on Apache Spark

Recommender System with Apache Spark

Recommendation System for E-commerce Using Alternating Least Squares (ALS) on Apache Spark

References

Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gordon, L.R., Riedl, J.: GroupLens: applying collaborative filtering to usenet news. Commun. ACM 40(3), 77–87 (1997)
Article Google Scholar
David, W.M.: Ubiquitous recommendation system. Computer 36(10), 111–112 (2003)
Article Google Scholar
Balabanovic, M., Shoham, Y.: Fab: content-based, collaborative recommendation. Commun. ACM 40(3), 66–72 (1997)
Article Google Scholar
Adomaviciusand, A., Tuzhilin, G.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)
Article Google Scholar
Agneeswaran, V.: Big Data Analytics Beyond Hadoop. Pearson FT Press, Publisher (2014)
Google Scholar
Goldberg, D., Nichols, D., Oki, B.M., Terry, D.: Using collaborative filtering to weave an information tapestry. Commun. ACM 35(12), 61–71 (1992)
Article Google Scholar
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. ACM CSCW 94, 175–186 (1994)
Google Scholar
Claypool, M., Gokhale, A., Miranda, T., Murnikov, P., Netes, D., Sartin, M.: Combining content-based and collaborative filters in an online newspaper, In: Proceedings of ACM SIGIR Workshop on Recommender Systems (1999)
Google Scholar
Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommender algorithms. In: WWW 2001: Proceedings of the 10th International Conference on World Wide Web, May, pp. 285–295 (2001)
Google Scholar
Charu, C., Aggarwal.: Recommender Systems 1st ed. (2016). ISBN-10: 3319296574
Google Scholar
Das, A. S., Datar, M., Garg, A., Shyam, R.: Google news personalization: scalable online collaborative filtering. In: Proceedings of the 16th International Conference on World Wide Web (2007)
Google Scholar
Hadoop - Apache Software Foundation project home page[http://hadoop.apache.org/]
Niemenmaa, M., et al.: Hadoop-BAM: directly manipulating next generation sequencing data in the cloud. Bioinformatics 28(6), 876–877 (2012)
Article Google Scholar
Schumacher, A., et al.: SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop. Bioinformatics 30(1), 119–120 (2013)
Article Google Scholar
Taylor, R.C.: An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. In: BMC bioinformatics, vol. 11, No. 12. BioMed Central (2010)
Article MathSciNet Google Scholar
Nordberg, H., et al.: BioPig: a Hadoop-based analytic toolkit for large-scale sequence data. Bioinformatics 29(23), 3014–3019 (2013)
Article Google Scholar
Zou, Q., et al.: Survey of MapReduce frame operation in bioinformatics. Brief. Bioinform. 15(4), 637–647 (2013)
Article Google Scholar
O’Driscoll, A., Daugelaite, J., Sleator, R.D.: ‘Big data’, Hadoop and cloud computing in genomics. J. Biomed. Inform. 46(5), 774–781 (2013)
Article Google Scholar
Pratt, B., et al.: MR-tandem: parallel X! tandem using hadoop MapReduce on amazon Web services. Bioinformatics 28(1), 136–137 (2011)
Article Google Scholar
Akter, S., Wamba, S.F.: Big data analytics in e-commerce: a systematic review and agenda for future research. Electr. Markets 26(2), 173–194 (2016)
Article Google Scholar
Salloum, S., et al.: Big data analytics on Apache Spark. Int. J. Data Sci. Anal. 1(3–4), 145–164 (2016)
Article Google Scholar
Armbrust, M., Ghodsi, A., Zaharia, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J.: Spark SQL. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data—SIGMOD 2015, ACM Press, New York, NY, USA, pp. 1383–1394 (2015). https://doi.org/10.1145/2723372.2742797, http://dl.acm.org/citation.cfm?id=2723372.2742797
Gonzalez, J.E.: From graphs to tables the design of scalable systems for graph analytics. In: 23rd International World Wide Web Conference, WWW 2014, Seoul, Republic of Korea, April 7–11, 2014, Companion Volume, pp. 1149–1150 (2014). https://doi.org/10.1145/2567948.2580059
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., Xin, D., Xin, R., Franklin, M.J., Zadeh, R., Zaharia, M., Talwalkar, A.: Mllib: Machine learning in apache spark. arXiv:1505.06807 (2015)
Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES 2013, co-loated with SIGMOD/PODS 2013, New York, NY, USA, June 24, 2013, p. 2 (2013). http://event.cwi.nl/grades2013/02-xin.pdf
Zadeh, R.B., Meng, X., Yavuz, B., Staple, A., Pu, L., Venkataraman, S., Sparks, E., Ulanov, A., Zaharia, M.: linalg: Matrix computations in apache spark. arXiv:1509.02256 (2015)
Zaharia, M.: An Architecture for Fast and General Data Processing on Large Clusters. Association for Computing Machinery, New York (2016)
Book Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets p. 10 (2010). http://dl.acm.org/citation.cfm?id=1863103.1863113
Zaharia, M., Chowdhury, M., Das, T., Dave, A.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI 2012 Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, pp. 2–2 (2012). https://doi.org/10.1111/j.1095-8649.2005.00662.x
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, ACM, New York, NY, USA, SOSP 2013, pp. 423–438 (2013). https://doi.org/10.1145/2517349.2522737
Xin, R.S., Crankshaw, D., Dave, A., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: Unifying data-parallel and graph-parallel analytics. CoRR arXiv:1402.2394 (2014)
Xiao, B.: Huawei embraces open-source apache spark (2015). https://databricks.com/blog/2015/06/09/huawei-embraces-open-source-apache-spark.html

Download references

Author information

Authors and Affiliations

Department of Statistics and Information Science, Fu Jen Catholic University, New Taipei City, Taiwan
Jung-Bin Li, Yu-Hsiang Hsu & Ying-Chu Huang
Department of Information Management, Chung Yuan Christian University, Taoyuan City, Taiwan
Szu-Yin Lin

Authors

Jung-Bin Li
View author publications
You can also search for this author in PubMed Google Scholar
Szu-Yin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Hsiang Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Ying-Chu Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jung-Bin Li .

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, Faculty of Information Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli
Tunghai University, Taichung, Taiwan
Fang-Yie Leu
Rissho University, Tokyo, Japan
Tomoya Enokido
Asia University, Taichung, Taiwan
Hsing-Chung Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, JB., Lin, SY., Hsu, YH., Huang, YC. (2019). Implementation of an Alternating Least Square Model Based Collaborative Filtering Movie Recommendation System on Hadoop and Spark Platforms. In: Barolli, L., Leu, FY., Enokido, T., Chen, HC. (eds) Advances on Broadband and Wireless Computing, Communication and Applications. BWCCA 2018. Lecture Notes on Data Engineering and Communications Technologies, vol 25. Springer, Cham. https://doi.org/10.1007/978-3-030-02613-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-02613-4_21
Published: 19 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02612-7
Online ISBN: 978-3-030-02613-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Implementation of an Alternating Least Square Model Based Collaborative Filtering Movie Recommendation System on Hadoop and Spark Platforms

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Improved ALS Recommendation Model Based on Apache Spark

Recommender System with Apache Spark

Recommendation System for E-commerce Using Alternating Least Squares (ALS) on Apache Spark

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Implementation of an Alternating Least Square Model Based Collaborative Filtering Movie Recommendation System on Hadoop and Spark Platforms

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Improved ALS Recommendation Model Based on Apache Spark

Recommender System with Apache Spark

Recommendation System for E-commerce Using Alternating Least Squares (ALS) on Apache Spark

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation