Nearest biclusters collaborative filtering framework with fusion

https://doi.org/10.1016/j.jocs.2017.03.018Get rights and content

Highlights

  • Propose to fuse the item-based CF and user-based CF.

  • New similarity measure to obtain the Nearest Biclusters.

  • Our approach greatly outperforms classical collaborative filtering techniques.

Abstract

Collaborative filtering is one of the widely used recommendation technique. It provides automated and personalized suggestions to consumers for selecting variety of products by examining their preferences. However, sparsity is one of the major weaknesses of this prosperous approach. This problem inherently occurs in the system due to ever increasing number of users and items. This affects the performance of a recommender system as the accuracy of prediction decreases. Thus, there is a need for a technique that can perform efficiently under sparse environment and this work proposes one such technique. The memory based CF techniques can be user-based or item based. In both cases, the user-item rating matrix can provide only partial information to predict unknown ratings. This is due to the sparsity inherent to rating data. Hence, we propose to fuse the item-based CF and user-based CF. Subsequently, Neighborhood formation is a crucial step in Collaborative filtering technique. Therefore, this paper adopts the biclustering approach for neighborhood formation. This approach, allows a degree of overlap between biclusters (i.e. a user or item is included in more than one clusters). Therefore, a new similarity measure is proposed that obtains a bicluster that has strong partial similarity with an active users’ preferences. Experimental results demonstrate that proposed approach generates better accuracy of rate prediction compared to the tradition item-based, user-based and some state of the art approaches.

Introduction

A Recommender System (RS) [1], [2], [3], is an information-filtering tool that helps to mitigate parts of the information overload generated via explosion of available information on the Internet. It attempts to reduce the amount of information available to the user by presenting information on items and products that are likely to be of interest to the user. For example, sites like Netflix prescribe films, Flipkart and Amazon prescribe products of interest to the users. It collects information on the preferences of its user for a set of items i.e. songs, electronic products, books etc. and makes use of this collected information [4] to provide recommendations to other users. In everyday life, we often rely on suggestions of like-minded individuals, or other trusted sources. A Recommender System is an automated form of this word-of-mouth phenomenon. The operations of a RS are generally based on Collaborative filtering (CF) [5] technique. CF based recommendation system is inspired by human social behavior. According to this, if the users have similar taste for a set of items in past, then they will share common taste in future.

The CF algorithms can be categorized into Memory based and Model based algorithms [6], [7]. The Model based algorithms learn a model in an offline phase and then uses it as a “model” to perform recommendations [8]. On the other hand, Memory based algorithms perform recommendations based on similarity between users or items [9]. Memory based algorithms are further categorized into: User-based CF and Item-based CF algorithms [9]. The User-based CF algorithms preform recommendations by using the previous ratings of most similar users on the specific item. On the other hand, Item-based CF algorithms gives recommendations by finding similar items [10].

Although, CF is one of the most popular and successful approach for recommendation, it experiences a serious limitation specifically sparsity problem. The sparsity problem [11] refers to the situation where the ratio of ratings that need to be predicted to ratings already obtained is very high i.e. to predict more than 90% of the ratings from less than 10% of available data. For instance, a RS used by an online store can make everything available to the customer that exists. As the number of products and users increases tremendously, most of the items even very popular ones has been given feedback or purchased only by few users.

A CF based recommendation system initially determines the most similar users or items and then makes recommendations to individual users. It is based on the other user’s past purchasing history with the fact that the system has only limited amount of available data i.e. less than 1% [12], [13]. Typically, the neighborhood can be formed by using a similarity measure or by adapting a clustering approach to finding the most similar users or items. Due to sparsity, it is hard to define the similarity between items or users, rendering CF useless. Even if the framework succeeds in computation of similarity, there might be a possibility that this similarity is not truthful because of insufficient amount of information processed.

To overcome this problem, the work here proposes a new CF approach algorithm based on user/item preference biclustering to address the sparsity issue. Initially the biclustering approach is adapted to simultaneously cluster the rows and columns of the user/item rating matrix. This helps the system to find the quality neighbors. As a result, we obtain the clusters of users/items with strong partial similarity within cluster. Then the nearest bicluster of active user/item is computed. To obtain the nearest bicluster of each user, we propose a new similarity approach based on Mean Measure of Divergence that computes individual's personal habits to express preferences. The resultant cluster helps to identify the neighbor users or neighbor items. The similarity between users in case of User-based CF or items in case of Item-based CF is computed respectively. Finally, the resultant prediction of each model is combined using weighted sum of item-based CF and user-based CF approaches to generate recommendation results for target users.

Section snippets

Background work

In recent couple of years, numerous scientists have incorporated clustering [14], [15], [16] methods with numerous CF based recommender systems to handle the sparsity problem. Sarwar et al. [12] created clusters of users with similar preferences by partitioning the ratings database. O’Connor & Herlocker [17] created clusters of items by applying different clustering algorithms. Xue et al. [18] utilized clustering approach for smoothing the rating dataset and to determine k-Nearest Neighbor

Proposed methodology: nearest biclusters collaborative filtering with fusion (NBCFu)

Fig. 1 presents an overview of the proposed approach called NBCFu (Nearest Biclusters Collaborative Filtering with fusion). In the first phase, xMotif biclustering algorithm is applied to group the similar users/items to create overlapping biclusters. In next phase, the nearest biclusters of each user and item are computed. Later the user-based and item-based model is adapted respectively to rate predict for each unseen item. Finally, the resultant predictions of each model are combined to

Data sets

Performance of NBCFu is evaluated on three extensively tested real-world datasets, namely MovieLens 100 K, MovieLens 1 M (http://grouplens.org/datasets/movielens/) and EachMovie. MovieLens 100 K (ML–100 K) consists of 100,000 ratings (1–5) on 1682 movies by 943 users. Each user has rated at least 20 movies but still this dataset is 93.69% sparse. MovieLens 1 M (ML–1 M) consists of approximately one million ratings (1–5) for 3952 movies reviewed by 6040 users. This dataset is nearly 95.75%sparse. The

Conclusion

Over the past decade, CF remains one of the most popular and widely accepted method that can handle the information overload problem effectively. It aims at suggesting suitable items for a user based on rating information collected from other similar users.

Although, they are very successful and popular in many areas, they often confront the sparsity problem. In this paper, we propose to merge the item-based and user-based CF in a weighted sum approach. The item-based CF and user-based CF take

Surya Kant received the B. Tech degree in Computer Science & Engineering from UPTU in 2009, M.Tech from NIT Jalandhar in 2012 .Currently; he is a Ph.D.student with the Department of Polymer and Process Engineering IIT Roorkee, India. His research interests include Data Mining and Machine Learning.

References (44)

  • R. Katarya et al.

    A collaborative recommender system enhanced with particle swarm optimization technique

    Multimed. Tools Appl.

    (2016)
  • V.K. Jain et al.

    Extraction of emotions from multilingual text using intelligent text processing and computational linguistics

    J. Comput. Sci.

    (2017)
  • H. Kim, J. An, C. Wook, A Novel Evolutionary Approach to Recommender Systems, no. xx, pp. 2–5,...
  • A.M. Rashid et al.

    ClustKNN: a highly scalable hybrid model- & memory-based CF algorithm

    Search

    (2006)
  • J.-M. Yang, K. F. Li, Recommendation based on rational inferences in collaborative filtering,...
  • Y. Shi et al.

    Collaborative filtering beyond the user-Item matrix

    ACM Comput. Surv.

    (2014)
  • Y. Cai et al.

    Typicality-based collaborative filtering recommendation

    IEEE Trans. Knowl. Data Eng.

    (2014)
  • K.G. Saranya, G.S. Sadasivam, Modified Heuristic Similarity Measure for Personalization using Collaborative Filtering...
  • B.M. Sarwar et al.

    Recommender systems for large-scale E-Commerce: scalable neighborhood formation using clustering

    Communications

    (2002)
  • B. Sarwar et al.

    Item-based collaborative filtering recommendation algorithms

    Proc. 10th …

    (2001)
  • S. Kant et al.

    An improved K means clustering with Atkinson index to classify liver patient dataset

    Int. J. Syst. Assur. Eng. Manage.

    (2016)
  • G. Costa et al.

    Model-Based collaborative personalized recommendation on signed social rating networks

    ACM Trans. Internet Technol.

    (2016)
  • Cited by (24)

    • Scalability achievements for enumerative biclustering with online partitioning: Case studies involving mixed-attribute datasets

      2021, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      This concept is appealing in many domains, such as natural sciences (Madeira and Oliveira, 2004; Xie et al., 2019) and market basket analysis (Luna et al., 2019), which are the most popular domains of application in the biclustering and FPM literature, respectively. However, to exemplify the wide range of applications, we can also mention relevant contributions to recommender systems (Jiang et al., 2019; Kant and Mahara, 2018; Yoldar and Özcan, 2019), educational data mining (Henriques et al., 2019), finding the minimal subbases of finite topological spaces (Chen et al., 2020), explainable knowledge discovery (Couceiro and Napoli, 2019), emerging topic detection in twitter stream (Choi and Park, 2019), analysis of mobile data (Kong et al., 2019), and detection of counterfeit products (Benatia et al., 2020). Biclustering involves hard combinatorial optimization.

    • LeaderRank based k-means clustering initialization method for collaborative filtering

      2018, Computers and Electrical Engineering
      Citation Excerpt :

      On the other hand, the Collaborative Filtering recommends items based on the analysis of the ratings given by the users that are usually represented in the form of user-item rating matrix. CF is one of the most used approaches for RS and is further categorized as Memory-based and Model-based methods [1]. Memory-based algorithms are motivated by the phenomenon that people generally believe in recommendations from other people who have similar preferences.

    • Intelligent computational techniques

      2018, Journal of Computational Science
    • Collaborative targeting: Biclustering-based online ad recommendation

      2019, Electronic Commerce Research and Applications
      Citation Excerpt :

      Zhang et al. (2014) proposed a biclustering and fusion-based recommendation technique for the cold start problem, and then compared their methods to the UBCF and IBCF algorithms. In a similar study, Kant and Mahara (2018) fused IBCF and UBCF in a weighted sum approach and used biclustering for the neighborhood formation to handle the sparsity problem, which led them to obtain improved prediction results. The aggregation of preferences, criteria, or similarities takes place in various stages in recommender systems.

    View all citing articles on Scopus

    Surya Kant received the B. Tech degree in Computer Science & Engineering from UPTU in 2009, M.Tech from NIT Jalandhar in 2012 .Currently; he is a Ph.D.student with the Department of Polymer and Process Engineering IIT Roorkee, India. His research interests include Data Mining and Machine Learning.

    Tripti Mahara received the B. E degree in Computer Science & Engineering from Sardar Patel University in 1999, M.Tech and Ph.D. from Industrial and Management Engineering, I.I.T Kanpur in 2004 and 2009, respectively. Currently, she is Assistant Professor with the Department of Polymer and Process Engineering IIT Roorkee, India.Her research interests include ERP and Data Mining.

    View full text