Abstract
Nowadays, consumers and businesses all face the problem of information explosion. Recommendation systems represent a powerful solution This study practices a movie recommendation system to give suggestions of films to the movie-watcher, enabling him to consume more while shortening the time interval between payments.
This research implements a prototype recommendation system based on collaborative filtering with Alternating Least Squares (ALS) algorithm. Collaborative filtering has the advantage of avoiding possible violation of the Personal Information Protection Act and reducing the possibility the errors caused by poor quality of personal information. However, one of its shortcomings is the scalability. Our study attempts to improve it by adopting Spark with Hadoop Yarn platform and uses it to compute movie recommendation and to store data respectively. The result of this research shows that the proposed system offers recommendations with satisfying accuracy while keeping acceptable computation time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gordon, L.R., Riedl, J.: GroupLens: applying collaborative filtering to usenet news. Commun. ACM 40(3), 77–87 (1997)
David, W.M.: Ubiquitous recommendation system. Computer 36(10), 111–112 (2003)
Balabanovic, M., Shoham, Y.: Fab: content-based, collaborative recommendation. Commun. ACM 40(3), 66–72 (1997)
Adomaviciusand, A., Tuzhilin, G.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)
Agneeswaran, V.: Big Data Analytics Beyond Hadoop. Pearson FT Press, Publisher (2014)
Goldberg, D., Nichols, D., Oki, B.M., Terry, D.: Using collaborative filtering to weave an information tapestry. Commun. ACM 35(12), 61–71 (1992)
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. ACM CSCW 94, 175–186 (1994)
Claypool, M., Gokhale, A., Miranda, T., Murnikov, P., Netes, D., Sartin, M.: Combining content-based and collaborative filters in an online newspaper, In: Proceedings of ACM SIGIR Workshop on Recommender Systems (1999)
Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommender algorithms. In: WWW 2001: Proceedings of the 10th International Conference on World Wide Web, May, pp. 285–295 (2001)
Charu, C., Aggarwal.: Recommender Systems 1st ed. (2016). ISBN-10: 3319296574
Das, A. S., Datar, M., Garg, A., Shyam, R.: Google news personalization: scalable online collaborative filtering. In: Proceedings of the 16th International Conference on World Wide Web (2007)
Hadoop - Apache Software Foundation project home page[http://hadoop.apache.org/]
Niemenmaa, M., et al.: Hadoop-BAM: directly manipulating next generation sequencing data in the cloud. Bioinformatics 28(6), 876–877 (2012)
Schumacher, A., et al.: SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop. Bioinformatics 30(1), 119–120 (2013)
Taylor, R.C.: An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. In: BMC bioinformatics, vol. 11, No. 12. BioMed Central (2010)
Nordberg, H., et al.: BioPig: a Hadoop-based analytic toolkit for large-scale sequence data. Bioinformatics 29(23), 3014–3019 (2013)
Zou, Q., et al.: Survey of MapReduce frame operation in bioinformatics. Brief. Bioinform. 15(4), 637–647 (2013)
O’Driscoll, A., Daugelaite, J., Sleator, R.D.: ‘Big data’, Hadoop and cloud computing in genomics. J. Biomed. Inform. 46(5), 774–781 (2013)
Pratt, B., et al.: MR-tandem: parallel X! tandem using hadoop MapReduce on amazon Web services. Bioinformatics 28(1), 136–137 (2011)
Akter, S., Wamba, S.F.: Big data analytics in e-commerce: a systematic review and agenda for future research. Electr. Markets 26(2), 173–194 (2016)
Salloum, S., et al.: Big data analytics on Apache Spark. Int. J. Data Sci. Anal. 1(3–4), 145–164 (2016)
Armbrust, M., Ghodsi, A., Zaharia, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J.: Spark SQL. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data—SIGMOD 2015, ACM Press, New York, NY, USA, pp. 1383–1394 (2015). https://doi.org/10.1145/2723372.2742797, http://dl.acm.org/citation.cfm?id=2723372.2742797
Gonzalez, J.E.: From graphs to tables the design of scalable systems for graph analytics. In: 23rd International World Wide Web Conference, WWW 2014, Seoul, Republic of Korea, April 7–11, 2014, Companion Volume, pp. 1149–1150 (2014). https://doi.org/10.1145/2567948.2580059
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., Xin, D., Xin, R., Franklin, M.J., Zadeh, R., Zaharia, M., Talwalkar, A.: Mllib: Machine learning in apache spark. arXiv:1505.06807 (2015)
Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES 2013, co-loated with SIGMOD/PODS 2013, New York, NY, USA, June 24, 2013, p. 2 (2013). http://event.cwi.nl/grades2013/02-xin.pdf
Zadeh, R.B., Meng, X., Yavuz, B., Staple, A., Pu, L., Venkataraman, S., Sparks, E., Ulanov, A., Zaharia, M.: linalg: Matrix computations in apache spark. arXiv:1509.02256 (2015)
Zaharia, M.: An Architecture for Fast and General Data Processing on Large Clusters. Association for Computing Machinery, New York (2016)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets p. 10 (2010). http://dl.acm.org/citation.cfm?id=1863103.1863113
Zaharia, M., Chowdhury, M., Das, T., Dave, A.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI 2012 Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, pp. 2–2 (2012). https://doi.org/10.1111/j.1095-8649.2005.00662.x
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, ACM, New York, NY, USA, SOSP 2013, pp. 423–438 (2013). https://doi.org/10.1145/2517349.2522737
Xin, R.S., Crankshaw, D., Dave, A., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: Unifying data-parallel and graph-parallel analytics. CoRR arXiv:1402.2394 (2014)
Xiao, B.: Huawei embraces open-source apache spark (2015). https://databricks.com/blog/2015/06/09/huawei-embraces-open-source-apache-spark.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, JB., Lin, SY., Hsu, YH., Huang, YC. (2019). Implementation of an Alternating Least Square Model Based Collaborative Filtering Movie Recommendation System on Hadoop and Spark Platforms. In: Barolli, L., Leu, FY., Enokido, T., Chen, HC. (eds) Advances on Broadband and Wireless Computing, Communication and Applications. BWCCA 2018. Lecture Notes on Data Engineering and Communications Technologies, vol 25. Springer, Cham. https://doi.org/10.1007/978-3-030-02613-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-02613-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02612-7
Online ISBN: 978-3-030-02613-4
eBook Packages: EngineeringEngineering (R0)