Abstract
Personalized recommendation systems have emerged as useful tools for recommending the appropriate items to individual users. However, in such situations, some items tend to be consumed by groups of users, such as tourist attractions or television programs. With this purpose in mind, Group Recommender Systems (GRSs) are tailored to help groups of users to find suitable items according to their preferences and needs. In general, these systems often confront the sparsity problem, which negatively affects their efficiency. With the increase in the number of users, items, groups, and ratings in the system. Data becomes too big to be processed efficiently by traditional systems. Thus, there is an increasing need for distributed recommendation approaches able to manage the issues related to Big Data and sparsity problem. In this paper, we propose a distributed group recommendation system, which is designed based on Apache Spark to handle large-scale data. It integrates a novel proposed recommendation method, a dimension reduction technique, with supervised and unsupervised learning for dealing efficiently with the curse of dimensionality problem, detecting the groups of users, and improving the prediction quality. Experimental results on three real-world data sets show that our proposal is significantly better than other competitors.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Castro J, Lu J, Zhang G, Dong Y, Martínez L (2018) Opinion dynamics-based group recommender systems. IEEE Trans Syst Man Cybern Syst Hum 48(12):2394–2406. https://doi.org/10.1109/TSMC.2017.2695158
Ekstrand MD, Riedl JT, Konstan JA (2011) Collaborative filtering recommender systems. Foundations and Trends®. Human–Comput Interact 4(2):81–173
Dakhel AM, Malazi HT, Mahdavi M (2018) A social recommender system using item asymmetric correlation. Appl Intell 48(3):527–540
Hammou BA, Lahcen AA (2017) FRAIPA: A fast recommendation approach with improved prediction accuracy. Expert Syst Appl 87:90–97
Zhang F, Gong T, Lee VE, Zhao G, Rong C, Qu G (2016) Fast algorithms to evaluate collaborative filtering recommender systems. Knowl-Based Syst 96:96–103
Christensen IA, Schiaffino S (2011) Entertainment recommender systems for group of users. Expert Syst Appl 38(11):14127–14135
Liu B (2007) Web data mining: exploring hyperlinks, contents, and usage data. Springer Science & Business Media
Ricci F, Rokach L, Shapira B (2015) Recommender systems: introduction and challenges. In: Recommender systems handbook. Springer, Boston, pp 1–34
Castro J, Yera R, Martínez L (2018) A fuzzy approach for natural noise management in group recommender systems. Expert Syst Appl 94:237–249
Boratto L, Carta S, Fenu G (2016) Discovery and representation of the preferences of automatically detected groups: Exploiting the link between group modeling and clustering. Fut Gener Comput Syst 64:165–174
Boratto L, Carta S, Fenu G (2017) Investigating the role of the rating prediction task in granularity-based group recommender systems and big data scenarios. Inf Sci 378:424–443
Apache Spark, https://spark.apache.org. Last accessed July 10, 2018
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Stoica I (2012) Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, pp 2–2
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica, I (2010) Spark: Cluster computing with working sets. HotCloud 10(10-10):95
Apache Cassandra, https://cassandra.apache.org. Last accessed July 10, 2018
Lakshman A, Malik, P (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS Oper Syst Rev 44(2):35–40
Apache Spark’s scalable machine learning library (MLlib), https://spark.apache.org/mllib/. Last accessed July 10, 2018
Meng X, Bradley J, Yavuz B, Sparks E, Venkataraman S, Liu D, Xin, D (2016) Mllib: Machine learning in apache spark. J Mach Learn Res 17(1):1235–1241
Shani G, Gunawardana A (2011) Evaluating recommendation systems. In: Recommender systems handbook. Springer, Boston, pp 257–297
McCarthy JF, Anagnost T MusicFX: An Arbiter of Group Preferences for Computer-Supported Cooperative Workouts. In: 1998 ACM Conference on Computer-Supported Cooperative Work (CSCW’98)
Chao DL, Balthrop J, Forrest S (2005) Adaptive radio: achieving consensus using negative preferences. In: Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work. ACM, pp 120–123
O’connor M, Cosley D, Konstan JA, Riedl J (2001) PolyLens: a recommender system for groups of users. In: ECSCW 2001. Springer, Dordrecht, pp 199–218
Aggarwal CC (2016) Recommender systems. Springer International Publishing, Cham, pp 1–28
Ardissono L, Goy A, Petrone G, Segnan M, Torasso P (2001) Tailoring the recommendation of tourist information to heterogeneous user groups. In: Workshop on adaptive hypermedia. Springer, Berlin, pp 280–295
Yu Z, Zhou X, Hao Y, Gu, J (2006) TV program recommendation for multiple viewers based on user profile merging. User Model User-Adapt Interact 16(1):63–82
Quijano-Sanchez L, Recio-Garcia JA, Diaz-Agudo B, Jimenez-Diaz, G (2013) Social factors in group recommender systems. ACM Trans Intell Syst Technol (TIST) 4(1):8
Chen YL, Cheng LC, Chuang, CN (2008) A group recommendation system with consideration of interactions among group members. Expert Syst Appl 34(3):2082–2090
Agarwal A, Chakraborty M, Chowdary, C R (2017) Does order matter? Effect of order in group recommendation. Expert Syst Appl 82:115–127
Hammou BA, Lahcen AA, Mouline, S (2018) APRA: An approximate parallel recommendation algorithm for Big Data. Knowl-Based Syst 157:10–19
Garcia I, Pajares S, Sebastia L, Onaindia, E (2012) Preference elicitation techniques for group recommender systems. Inf Sci 189:155–175
Castro J, Yera R, martínez L (2017) An empirical study of natural noise management in group recommendation systems. Decis Support Syst 94:1–11
Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. Icml 1:577– 584
Zhang YW, Zhou YY, Wang FT, Sun Z, He, Q (2018) Service recommendation based on quotient space granularity analysis and covering algorithm on Spark. Knowl-Based Syst 147:25–35
Kashef R, Kamel, MS (2009) Enhanced bisecting k-means clustering using intermediate cooperation. Pattern Recogn 42(11):2557–2569
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, pp 785–794
Harper FM, Konstan, JA (2016) The movielens datasets: History and context. ACM Trans Interact Intell Syst (tiis) 5(4):19
Hu R, Dou W, Liu, J (2014) ClubCF: A clustering-based collaborative filtering approach for big data application. IEEE Trans Emerg Top Comput 2(3):302–313
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Extreme gradient boosting
Appendix: Extreme gradient boosting
XGBoost (Extreme gradient boosting) represents a powerful machine learning technique. It is a tree ensemble, which consists of a set of classification and regression trees (CARTs).
Mathematically, the XGBoost model is described as follows:
where K is the number of trees, f is a function in the functional space г, and г is the set of all possible CARTs.
The objective function to optimize is written as follows:
where l is the loss function that measures the difference between the predicted value \(\hat y_i\) and the target value yi. While Ω represents the regularization term.
With regard to the training task, XGBoost is trained in an additive manner. The prediction value \(\hat y^{(t)}_i\) of the ith instance at the tth iteration is given by:
Therefore, the objective function at the tth iteration is defined as follows:
In order to approximate the objective function, Taylor expansion is employed. The formulation can be written as follows:
where gi and hi are defined as:
Let Ij be a set of instances assigned to the leaf j, after defining the regularization term Ω, (39) can be rewritten as follows:
where T is the number of leaves in the tree, λ and γ are the regularization parameters, and wj is the weight of leaf j.
Given a fixed tree structure, the optimal weight wj∗ and the objective are computed as follows:
The smaller objective value indicates the better structure.
Regarding the construction of trees, XGBoost adopts a greedy algorithm, which starts from tree with depth 0, and iteratively splits the leaves of the tree.
The gain after adding a split is measured as follows:
where IR and IL denote the instance sets of right and left nodes after the split, respectively [35].
Rights and permissions
About this article
Cite this article
Ait Hammou, B., Ait Lahcen, A. & Mouline, S. A distributed group recommendation system based on extreme gradient boosting and big data technologies. Appl Intell 49, 4128–4149 (2019). https://doi.org/10.1007/s10489-019-01482-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01482-9