Skip to main content

Implementation of an Alternating Least Square Model Based Collaborative Filtering Movie Recommendation System on Hadoop and Spark Platforms

  • Conference paper
  • First Online:
Advances on Broadband and Wireless Computing, Communication and Applications (BWCCA 2018)

Abstract

Nowadays, consumers and businesses all face the problem of information explosion. Recommendation systems represent a powerful solution This study practices a movie recommendation system to give suggestions of films to the movie-watcher, enabling him to consume more while shortening the time interval between payments.

This research implements a prototype recommendation system based on collaborative filtering with Alternating Least Squares (ALS) algorithm. Collaborative filtering has the advantage of avoiding possible violation of the Personal Information Protection Act and reducing the possibility the errors caused by poor quality of personal information. However, one of its shortcomings is the scalability. Our study attempts to improve it by adopting Spark with Hadoop Yarn platform and uses it to compute movie recommendation and to store data respectively. The result of this research shows that the proposed system offers recommendations with satisfying accuracy while keeping acceptable computation time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gordon, L.R., Riedl, J.: GroupLens: applying collaborative filtering to usenet news. Commun. ACM 40(3), 77–87 (1997)

    Article  Google Scholar 

  2. David, W.M.: Ubiquitous recommendation system. Computer 36(10), 111–112 (2003)

    Article  Google Scholar 

  3. Balabanovic, M., Shoham, Y.: Fab: content-based, collaborative recommendation. Commun. ACM 40(3), 66–72 (1997)

    Article  Google Scholar 

  4. Adomaviciusand, A., Tuzhilin, G.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)

    Article  Google Scholar 

  5. Agneeswaran, V.: Big Data Analytics Beyond Hadoop. Pearson FT Press, Publisher (2014)

    Google Scholar 

  6. Goldberg, D., Nichols, D., Oki, B.M., Terry, D.: Using collaborative filtering to weave an information tapestry. Commun. ACM 35(12), 61–71 (1992)

    Article  Google Scholar 

  7. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. ACM CSCW 94, 175–186 (1994)

    Google Scholar 

  8. Claypool, M., Gokhale, A., Miranda, T., Murnikov, P., Netes, D., Sartin, M.: Combining content-based and collaborative filters in an online newspaper, In: Proceedings of ACM SIGIR Workshop on Recommender Systems (1999)

    Google Scholar 

  9. Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommender algorithms. In: WWW 2001: Proceedings of the 10th International Conference on World Wide Web, May, pp. 285–295 (2001)

    Google Scholar 

  10. Charu, C., Aggarwal.: Recommender Systems 1st ed. (2016). ISBN-10: 3319296574

    Google Scholar 

  11. Das, A. S., Datar, M., Garg, A., Shyam, R.: Google news personalization: scalable online collaborative filtering. In: Proceedings of the 16th International Conference on World Wide Web (2007)

    Google Scholar 

  12. Hadoop - Apache Software Foundation project home page[http://hadoop.apache.org/]

  13. Niemenmaa, M., et al.: Hadoop-BAM: directly manipulating next generation sequencing data in the cloud. Bioinformatics 28(6), 876–877 (2012)

    Article  Google Scholar 

  14. Schumacher, A., et al.: SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop. Bioinformatics 30(1), 119–120 (2013)

    Article  Google Scholar 

  15. Taylor, R.C.: An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. In: BMC bioinformatics, vol. 11, No. 12. BioMed Central (2010)

    Article  MathSciNet  Google Scholar 

  16. Nordberg, H., et al.: BioPig: a Hadoop-based analytic toolkit for large-scale sequence data. Bioinformatics 29(23), 3014–3019 (2013)

    Article  Google Scholar 

  17. Zou, Q., et al.: Survey of MapReduce frame operation in bioinformatics. Brief. Bioinform. 15(4), 637–647 (2013)

    Article  Google Scholar 

  18. O’Driscoll, A., Daugelaite, J., Sleator, R.D.: ‘Big data’, Hadoop and cloud computing in genomics. J. Biomed. Inform. 46(5), 774–781 (2013)

    Article  Google Scholar 

  19. Pratt, B., et al.: MR-tandem: parallel X! tandem using hadoop MapReduce on amazon Web services. Bioinformatics 28(1), 136–137 (2011)

    Article  Google Scholar 

  20. Akter, S., Wamba, S.F.: Big data analytics in e-commerce: a systematic review and agenda for future research. Electr. Markets 26(2), 173–194 (2016)

    Article  Google Scholar 

  21. Salloum, S., et al.: Big data analytics on Apache Spark. Int. J. Data Sci. Anal. 1(3–4), 145–164 (2016)

    Article  Google Scholar 

  22. Armbrust, M., Ghodsi, A., Zaharia, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J.: Spark SQL. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data—SIGMOD 2015, ACM Press, New York, NY, USA, pp. 1383–1394 (2015). https://doi.org/10.1145/2723372.2742797, http://dl.acm.org/citation.cfm?id=2723372.2742797

  23. Gonzalez, J.E.: From graphs to tables the design of scalable systems for graph analytics. In: 23rd International World Wide Web Conference, WWW 2014, Seoul, Republic of Korea, April 7–11, 2014, Companion Volume, pp. 1149–1150 (2014). https://doi.org/10.1145/2567948.2580059

  24. Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., Xin, D., Xin, R., Franklin, M.J., Zadeh, R., Zaharia, M., Talwalkar, A.: Mllib: Machine learning in apache spark. arXiv:1505.06807 (2015)

  25. Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES 2013, co-loated with SIGMOD/PODS 2013, New York, NY, USA, June 24, 2013, p. 2 (2013). http://event.cwi.nl/grades2013/02-xin.pdf

  26. Zadeh, R.B., Meng, X., Yavuz, B., Staple, A., Pu, L., Venkataraman, S., Sparks, E., Ulanov, A., Zaharia, M.: linalg: Matrix computations in apache spark. arXiv:1509.02256 (2015)

  27. Zaharia, M.: An Architecture for Fast and General Data Processing on Large Clusters. Association for Computing Machinery, New York (2016)

    Book  Google Scholar 

  28. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets p. 10 (2010). http://dl.acm.org/citation.cfm?id=1863103.1863113

  29. Zaharia, M., Chowdhury, M., Das, T., Dave, A.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI 2012 Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, pp. 2–2 (2012). https://doi.org/10.1111/j.1095-8649.2005.00662.x

  30. Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, ACM, New York, NY, USA, SOSP 2013, pp. 423–438 (2013). https://doi.org/10.1145/2517349.2522737

  31. Xin, R.S., Crankshaw, D., Dave, A., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: Unifying data-parallel and graph-parallel analytics. CoRR arXiv:1402.2394 (2014)

  32. Xiao, B.: Huawei embraces open-source apache spark (2015). https://databricks.com/blog/2015/06/09/huawei-embraces-open-source-apache-spark.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jung-Bin Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, JB., Lin, SY., Hsu, YH., Huang, YC. (2019). Implementation of an Alternating Least Square Model Based Collaborative Filtering Movie Recommendation System on Hadoop and Spark Platforms. In: Barolli, L., Leu, FY., Enokido, T., Chen, HC. (eds) Advances on Broadband and Wireless Computing, Communication and Applications. BWCCA 2018. Lecture Notes on Data Engineering and Communications Technologies, vol 25. Springer, Cham. https://doi.org/10.1007/978-3-030-02613-4_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02613-4_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02612-7

  • Online ISBN: 978-3-030-02613-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics