A case study of batch and incremental recommender systems in supermarket data under concept drifts and cold start
Introduction
Recommender systems are a hot topic in today’s world. Businesses of all kinds are interested in implementing recommender systems, as these allow individualized interactions with customers based on their preferences. A recommender system predicts an item’s probability to be preferred by a particular user (Zhang et al., 2019, Ricci et al., 2011). Over the years, different approaches were tailored to develop recommender systems. These are categorized as collaborative filtering (CF), content-based filtering (CBF), or hybrid approaches that combine the strategies mentioned above. Determining whether to use CF, CBF, or hybrid approaches depends on the availability and format of the data. In CF, the only data required is a list of user-item interactions, while in CBF, items’ details are required. Consequently, CF is less restrictive and has been the target of many works over the years (Bobadilla et al., 2013, Zhang et al., 2019).
In this paper, we focus on two problems that affect recommender systems. The first is concept drift, which refers to changes in the data behavior over time (Tsymbal, 2004, Webb et al., 2018). In recommender systems, concept drift reflects changes in the interactions between customers and items, either because (i) customers’ preferences change, (ii) new items become available for purchase, etc. The second is cold start, which occurs when new customers or items appear in the recommendation scenario. Such a problem is challenging, because the recommender model cannot make robust inferences for users or items about which it has not yet collected enough information (Ocepek et al., 2015, Shao et al., 2021).
Recommender systems are traditionally trained in a batch fashion, which means that given a training set composed of interactions between users and items, a static model is learned and deployed ad eternum. Consequently, it is relevant to tailor recommender systems that can be incremented over time, assuming that the interactions between users and items are made available as a stream of events.
In this paper, our goal is to bring forward a case study of existing recommender algorithms in a real-world supermarket scenario that exhibits both concept drifts and cold start problems. The contribution of this paper is threefold, as follows:
- •
A comparison of existing filtering recommender systems in which the impact of concept drift and cold start is assessed in both batch and streaming fashions.
- •
Evidence that the incremental ability of the streaming-based recommender systems allows a better recovery when cold start is present.
- •
A novel real-word supermarket dataset that exhibits concept drifts and cold start problems is made publicly available.
Section snippets
Recommender systems
Recommender systems have been successfully applied in many real-world scenarios, proving efficient in handling diverse information related to users, items, and their interaction. The goal of recommender systems is personalization, which refers to the ability to recommend relevant items to specific users based on their past interactions with the system. Considering the type of information used in the recommendation process and how the system models the user-item relationship, recommender systems
Positive-only approaches for recommender systems
In this section, we bring forward existing approaches for positive-only recommender systems. We organize the existing methods into batch and streaming approaches depending on how training takes place.
Supermarket dataset with implicit feedback SMDI)
This section describes the Supermarket Dataset with Implicit Feedback (SMDI) used in this case study, broadening data acquisition, pre-processing, and descriptive statistics.
Experimental protocol
This section describes the experimental protocol used to compare batch and streaming algorithms in the SMDI dataset. This experimental protocol is relevant to guarantee that batch and streaming methods are adequately compared and enable identifying concept drifts and cold start problems.
Fig. 7 shows the proposed batch and stream protocols. The dataset was split using the first two months of data for training and the remainder two months for testing. The temporal split makes more sense than a
Experimental results and analysis
This section reports the experimental results observed when comparing batch and streaming recommender algorithms applied to the SMDI datasets. We discuss the observations of the two proposed strategies planned in the experimental protocol, as follows: the basic evaluation in Section 6.1 and the window-based evaluation in Section 6.2.
Conclusion
This paper analyzed batch and stream learning algorithms concerning concept drifts and the cold start problem. As a by-product of this work, we made publicly available a new collaborative filtering supermarket dataset, alongside two pre-processed variants. As a result of this analysis, we observed that streaming recommender systems significantly overcome batch approaches. Thus, more effort should be put into tailoring techniques at the intersection of data streams and recommender systems. For
CRediT authorship contribution statement
Antônio David Viniski: Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Visualization. Jean Paul Barddal: Conceptualization, Methodology, Validation, Formal analysis, Writing - original draft, Visualization, Supervision, Project administration. Alceu Souza Britto Jr.: Conceptualization, Writing - review & editing, Supervision, Project administration. Fabrício Enembreck: Conceptualization, Writing - review & editing. Humberto Vinicius
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
We would like to thank the Conselho Nacional de Desenvolvimento Científico e Tecnológico – CNPq (Grant #142195/2019-7) for financing this research and the HiMarket company for the financial support and making the data analyzed in this work available. We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan V GPU used for this research.
References (49)
- et al.
Novel hybrid pair recommendations based on a large-scale comparative study of concept drift detection
Expert Systems with Applications
(2021) - et al.
Recommender systems survey
Knowledge-Based Systems
(2013) - et al.
Improving matrix factorization recommendations for examples in cold start
Expert Systems with Applications
(2015) - et al.
The use of machine learning algorithms in recommender systems: A systematic review
Expert Systems with Applications
(2018) - et al.
A survey of research hotspots and frontier trends of recommendation systems from the perspective of knowledge graph
Expert Systems with Applications
(2021) - et al.
The pure cold-start problem: A deep study about how to conquer first-time users in recommendations domains
Information Systems
(2019) - et al.
Collaborative filtering and deep learning based recommendation system for cold start items
Expert Systems with Applications
(2017) - et al.
Collaborative filtering and deep learning based recommendation system for cold start items
Expert Systems with Applications
(2017) Revenue prediction by mining frequent itemsets with customer analysis
Engineering Applications of Artificial Intelligence
(2017)- Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur,...