A case study of batch and incremental recommender systems in supermarket data under concept drifts and cold start

https://doi.org/10.1016/j.eswa.2021.114890Get rights and content

Highlights

  • Retail data made available depicts concept drift and cold start problems.

  • Neural networks are effective in recommending items to supermarket users.

  • Streaming recommenders outperform other methods in drifting and cold start issues.

Abstract

Recommender systems uncover relationships between users and items, thus allowing personalized recommendations. Nonetheless, users’ preferences may change over time, the so-called concept drifts; or new users and items may appear, making the recommender system unable to accurately map the relationship between users and items due to the cold start problem. Consequently, concept drift and cold start are challenges that downgrade the recommender system’s predictive performance. This paper assesses existing approaches for collaborative-filtering recommender systems over a real supermarket dataset that exhibits both of the issues mentioned above. For this purpose, our comparative analysis encompasses batch and streaming learning approaches. As a result, we can observe that streaming-based models achieve better recommendation rates since these are tailored to fit the concept drift. More specifically, the predictive performance of streaming-based recommendations increases by up to 21% over those provided by batch methods. The supermarket dataset used in experimentation is also made publicly available for future studies and recommender systems comparisons.

Introduction

Recommender systems are a hot topic in today’s world. Businesses of all kinds are interested in implementing recommender systems, as these allow individualized interactions with customers based on their preferences. A recommender system predicts an item’s probability to be preferred by a particular user (Zhang et al., 2019, Ricci et al., 2011). Over the years, different approaches were tailored to develop recommender systems. These are categorized as collaborative filtering (CF), content-based filtering (CBF), or hybrid approaches that combine the strategies mentioned above. Determining whether to use CF, CBF, or hybrid approaches depends on the availability and format of the data. In CF, the only data required is a list of user-item interactions, while in CBF, items’ details are required. Consequently, CF is less restrictive and has been the target of many works over the years (Bobadilla et al., 2013, Zhang et al., 2019).

In this paper, we focus on two problems that affect recommender systems. The first is concept drift, which refers to changes in the data behavior over time (Tsymbal, 2004, Webb et al., 2018). In recommender systems, concept drift reflects changes in the interactions between customers and items, either because (i) customers’ preferences change, (ii) new items become available for purchase, etc. The second is cold start, which occurs when new customers or items appear in the recommendation scenario. Such a problem is challenging, because the recommender model cannot make robust inferences for users or items about which it has not yet collected enough information (Ocepek et al., 2015, Shao et al., 2021).

Recommender systems are traditionally trained in a batch fashion, which means that given a training set composed of interactions between users and items, a static model is learned and deployed ad eternum. Consequently, it is relevant to tailor recommender systems that can be incremented over time, assuming that the interactions between users and items are made available as a stream of events.

In this paper, our goal is to bring forward a case study of existing recommender algorithms in a real-world supermarket scenario that exhibits both concept drifts and cold start problems. The contribution of this paper is threefold, as follows:

  • A comparison of existing filtering recommender systems in which the impact of concept drift and cold start is assessed in both batch and streaming fashions.

  • Evidence that the incremental ability of the streaming-based recommender systems allows a better recovery when cold start is present.

  • A novel real-word supermarket dataset that exhibits concept drifts and cold start problems is made publicly available.

This paper is divided as follows. Section 2 describes recommender systems, types of feedback, the problem of concept drift, and the challenge of cold start. Section 3 describes existing works on positive-only recommender systems in both batch and stream learning scenarios. Section 4 details a new dataset we make available regarding supermarket transactions, which exhibits concept drift and cold start characteristics. Section 5 describes the experiments undertaken to perform the proposed analysis of recommender systems. Section 6 analyzes existing works in recommender systems to answer whether streaming recommender systems overcome batch approaches w.r.t. concept drifts and cold start. Finally, Section 7 concludes this paper and describes future works.

Section snippets

Recommender systems

Recommender systems have been successfully applied in many real-world scenarios, proving efficient in handling diverse information related to users, items, and their interaction. The goal of recommender systems is personalization, which refers to the ability to recommend relevant items to specific users based on their past interactions with the system. Considering the type of information used in the recommendation process and how the system models the user-item relationship, recommender systems

Positive-only approaches for recommender systems

In this section, we bring forward existing approaches for positive-only recommender systems. We organize the existing methods into batch and streaming approaches depending on how training takes place.

Supermarket dataset with implicit feedback SMDI)

This section describes the Supermarket Dataset with Implicit Feedback (SMDI) used in this case study, broadening data acquisition, pre-processing, and descriptive statistics.

Experimental protocol

This section describes the experimental protocol used to compare batch and streaming algorithms in the SMDI dataset. This experimental protocol is relevant to guarantee that batch and streaming methods are adequately compared and enable identifying concept drifts and cold start problems.

Fig. 7 shows the proposed batch and stream protocols. The dataset was split using the first two months of data for training and the remainder two months for testing. The temporal split makes more sense than a

Experimental results and analysis

This section reports the experimental results observed when comparing batch and streaming recommender algorithms applied to the SMDI datasets. We discuss the observations of the two proposed strategies planned in the experimental protocol, as follows: the basic evaluation in Section 6.1 and the window-based evaluation in Section 6.2.

Conclusion

This paper analyzed batch and stream learning algorithms concerning concept drifts and the cold start problem. As a by-product of this work, we made publicly available a new collaborative filtering supermarket dataset, alongside two pre-processed variants. As a result of this analysis, we observed that streaming recommender systems significantly overcome batch approaches. Thus, more effort should be put into tailoring techniques at the intersection of data streams and recommender systems. For

CRediT authorship contribution statement

Antônio David Viniski: Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Visualization. Jean Paul Barddal: Conceptualization, Methodology, Validation, Formal analysis, Writing - original draft, Visualization, Supervision, Project administration. Alceu Souza Britto Jr.: Conceptualization, Writing - review & editing, Supervision, Project administration. Fabrício Enembreck: Conceptualization, Writing - review & editing. Humberto Vinicius

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

We would like to thank the Conselho Nacional de Desenvolvimento Científico e Tecnológico – CNPq (Grant #142195/2019-7) for financing this research and the HiMarket company for the financial support and making the data analyzed in this work available. We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan V GPU used for this research.

References (49)

  • Aggarwal, C. C. (Ed.) (2007). Data streams – models and algorithms. Volume 31 of advances in database systems....
  • J. Beel et al.

    Research-paper recommender systems: A literature survey

    International Journal on Digital Libraries

    (2016)
  • Bi, Y., Song, L., Yao, M., Wu, Z., Wang, J. & Xiao, J. (2020). DCDIR: A deep cross-domain recommendation system for...
  • L. Breiman

    Bagging predictors

    Machine Learning

    (1996)
  • B. Chandramouli et al.

    Streamrec: A real-time recommender system

  • S. Chang et al.

    Streaming recommender systems

  • Christy, A. J., Umamakeswari, A., Priyatharsini, L. & Neyaa, A. (2018). Rfm ranking–an effective approach to customer...
  • P. Cremonesi et al.

    Performance of recommender algorithms on top-n recommendation tasks

  • J. Demsar

    Statistical comparisons of classifiers over multiple data sets

    Journal of Machine Learning Research

    (2006)
  • M. Friedman

    The use of ranks to avoid the assumption of normality implicit in the analysis of variance

    Journal of the American Statistical Association

    (1937)
  • Funk, S. (2006). Netflix update: Try this at...
  • M.M. Gaber et al.

    Mining data streams: A review

    SIGMOD Record

    (2005)
  • J. Gama

    Knowledge discovery from data streams

    (2010)
  • J. Gama et al.

    On evaluating stream learning algorithms

    Machine Learning

    (2013)
  • Cited by (0)

    View full text