Nearest biclusters collaborative filtering framework with fusion

doi:10.1016/j.jocs.2017.03.018

Journal of Computational Science

Volume 25, March 2018, Pages 204-212

https://doi.org/10.1016/j.jocs.2017.03.018 Get rights and content

Highlights

•
Propose to fuse the item-based CF and user-based CF.
•
New similarity measure to obtain the Nearest Biclusters.
•
Our approach greatly outperforms classical collaborative filtering techniques.

Abstract

Collaborative filtering is one of the widely used recommendation technique. It provides automated and personalized suggestions to consumers for selecting variety of products by examining their preferences. However, sparsity is one of the major weaknesses of this prosperous approach. This problem inherently occurs in the system due to ever increasing number of users and items. This affects the performance of a recommender system as the accuracy of prediction decreases. Thus, there is a need for a technique that can perform efficiently under sparse environment and this work proposes one such technique. The memory based CF techniques can be user-based or item based. In both cases, the user-item rating matrix can provide only partial information to predict unknown ratings. This is due to the sparsity inherent to rating data. Hence, we propose to fuse the item-based CF and user-based CF. Subsequently, Neighborhood formation is a crucial step in Collaborative filtering technique. Therefore, this paper adopts the biclustering approach for neighborhood formation. This approach, allows a degree of overlap between biclusters (i.e. a user or item is included in more than one clusters). Therefore, a new similarity measure is proposed that obtains a bicluster that has strong partial similarity with an active users’ preferences. Experimental results demonstrate that proposed approach generates better accuracy of rate prediction compared to the tradition item-based, user-based and some state of the art approaches.

Introduction

A Recommender System (RS) [1], [2], [3], is an information-filtering tool that helps to mitigate parts of the information overload generated via explosion of available information on the Internet. It attempts to reduce the amount of information available to the user by presenting information on items and products that are likely to be of interest to the user. For example, sites like Netflix prescribe films, Flipkart and Amazon prescribe products of interest to the users. It collects information on the preferences of its user for a set of items i.e. songs, electronic products, books etc. and makes use of this collected information [4] to provide recommendations to other users. In everyday life, we often rely on suggestions of like-minded individuals, or other trusted sources. A Recommender System is an automated form of this word-of-mouth phenomenon. The operations of a RS are generally based on Collaborative filtering (CF) [5] technique. CF based recommendation system is inspired by human social behavior. According to this, if the users have similar taste for a set of items in past, then they will share common taste in future.

The CF algorithms can be categorized into Memory based and Model based algorithms [6], [7]. The Model based algorithms learn a model in an offline phase and then uses it as a “model” to perform recommendations [8]. On the other hand, Memory based algorithms perform recommendations based on similarity between users or items [9]. Memory based algorithms are further categorized into: User-based CF and Item-based CF algorithms [9]. The User-based CF algorithms preform recommendations by using the previous ratings of most similar users on the specific item. On the other hand, Item-based CF algorithms gives recommendations by finding similar items [10].

Although, CF is one of the most popular and successful approach for recommendation, it experiences a serious limitation specifically sparsity problem. The sparsity problem [11] refers to the situation where the ratio of ratings that need to be predicted to ratings already obtained is very high i.e. to predict more than 90% of the ratings from less than 10% of available data. For instance, a RS used by an online store can make everything available to the customer that exists. As the number of products and users increases tremendously, most of the items even very popular ones has been given feedback or purchased only by few users.

A CF based recommendation system initially determines the most similar users or items and then makes recommendations to individual users. It is based on the other user’s past purchasing history with the fact that the system has only limited amount of available data i.e. less than 1% [12], [13]. Typically, the neighborhood can be formed by using a similarity measure or by adapting a clustering approach to finding the most similar users or items. Due to sparsity, it is hard to define the similarity between items or users, rendering CF useless. Even if the framework succeeds in computation of similarity, there might be a possibility that this similarity is not truthful because of insufficient amount of information processed.

To overcome this problem, the work here proposes a new CF approach algorithm based on user/item preference biclustering to address the sparsity issue. Initially the biclustering approach is adapted to simultaneously cluster the rows and columns of the user/item rating matrix. This helps the system to find the quality neighbors. As a result, we obtain the clusters of users/items with strong partial similarity within cluster. Then the nearest bicluster of active user/item is computed. To obtain the nearest bicluster of each user, we propose a new similarity approach based on Mean Measure of Divergence that computes individual's personal habits to express preferences. The resultant cluster helps to identify the neighbor users or neighbor items. The similarity between users in case of User-based CF or items in case of Item-based CF is computed respectively. Finally, the resultant prediction of each model is combined using weighted sum of item-based CF and user-based CF approaches to generate recommendation results for target users.

Section snippets

Background work

In recent couple of years, numerous scientists have incorporated clustering [14], [15], [16] methods with numerous CF based recommender systems to handle the sparsity problem. Sarwar et al. [12] created clusters of users with similar preferences by partitioning the ratings database. O’Connor & Herlocker [17] created clusters of items by applying different clustering algorithms. Xue et al. [18] utilized clustering approach for smoothing the rating dataset and to determine k-Nearest Neighbor

Proposed methodology: nearest biclusters collaborative filtering with fusion (NBCFu)

Fig. 1 presents an overview of the proposed approach called NBCFu (Nearest Biclusters Collaborative Filtering with fusion). In the first phase, xMotif biclustering algorithm is applied to group the similar users/items to create overlapping biclusters. In next phase, the nearest biclusters of each user and item are computed. Later the user-based and item-based model is adapted respectively to rate predict for each unseen item. Finally, the resultant predictions of each model are combined to

Data sets

Performance of NBCFu is evaluated on three extensively tested real-world datasets, namely MovieLens 100 K, MovieLens 1 M (http://grouplens.org/datasets/movielens/) and EachMovie. MovieLens 100 K (ML–100 K) consists of 100,000 ratings (1–5) on 1682 movies by 943 users. Each user has rated at least 20 movies but still this dataset is 93.69% sparse. MovieLens 1 M (ML–1 M) consists of approximately one million ratings (1–5) for 3952 movies reviewed by 6040 users. This dataset is nearly 95.75%sparse. The

Conclusion

Over the past decade, CF remains one of the most popular and widely accepted method that can handle the information overload problem effectively. It aims at suggesting suitable items for a user based on rating information collected from other similar users.

Although, they are very successful and popular in many areas, they often confront the sparsity problem. In this paper, we propose to merge the item-based and user-based CF in a weighted sum approach. The item-based CF and user-based CF take

Surya Kant received the B. Tech degree in Computer Science & Engineering from UPTU in 2009, M.Tech from NIT Jalandhar in 2012 .Currently; he is a Ph.D.student with the Department of Polymer and Process Engineering IIT Roorkee, India. His research interests include Data Mining and Machine Learning.

References (44)

A.H. Celdrán et al.
Design of a recommender system based on users’ behavior and collaborative location and tracking
J. Comput. Sci.
(2016)
M. Ramezani et al.
A pattern mining approach to enhance the accuracy of collaborative filtering in sparse data domains
Phys. A Stat. Mech. Appl.
(2014)
S. Frémal et al.
Weighting strategies for a recommender system using item clustering based on genres
Expert Syst. Appl.
(2017)
C.X. Zhang et al.
Information filtering via collaborative user clustering modeling
Phys. A Stat. Mech. Appl.
(2014)
M.L. Wu et al.
Integrating content-based filtering with collaborative filtering using co-clustering with augmented matrices
Expert Syst. Appl.
(2014)
A.L. Vizine Pereira et al.
Simultaneous co-clustering and learning to address the cold start problem in recommender systems
Knowl.-Based Syst.
(2015)
H. Liu et al.
A new user similarity model to improve the accuracy of collaborative filtering
Knowl.-Based Syst.
(2014)
J. Bobadilla et al.
Recommender systems survey
Knowl.-Based Syst.
(2013)
S. Zahra et al.
Novel centroid selection approaches for KMeans-clustering based recommender systems
Inf. Sci. (NY)
(2015)
Hyun-Tae Kim et al.
A recommender system based on genetic algorithm for music data in 2010
2nd International Conference on Computer Engineering and Technology
(2010)

R. Katarya et al.

A collaborative recommender system enhanced with particle swarm optimization technique

Multimed. Tools Appl.

(2016)

V.K. Jain et al.

Extraction of emotions from multilingual text using intelligent text processing and computational linguistics

J. Comput. Sci.

(2017)

H. Kim, J. An, C. Wook, A Novel Evolutionary Approach to Recommender Systems, no. xx, pp. 2–5,...

A.M. Rashid et al.

ClustKNN: a highly scalable hybrid model- & memory-based CF algorithm

Search

(2006)

J.-M. Yang, K. F. Li, Recommendation based on rational inferences in collaborative filtering,...

Y. Shi et al.

Collaborative filtering beyond the user-Item matrix

ACM Comput. Surv.

(2014)

Y. Cai et al.

Typicality-based collaborative filtering recommendation

IEEE Trans. Knowl. Data Eng.

(2014)

K.G. Saranya, G.S. Sadasivam, Modified Heuristic Similarity Measure for Personalization using Collaborative Filtering...

B.M. Sarwar et al.

Recommender systems for large-scale E-Commerce: scalable neighborhood formation using clustering

Communications

(2002)

B. Sarwar et al.

Item-based collaborative filtering recommendation algorithms

Proc. 10th …

(2001)

S. Kant et al.

An improved K means clustering with Atkinson index to classify liver patient dataset

Int. J. Syst. Assur. Eng. Manage.

(2016)

G. Costa et al.

Model-Based collaborative personalized recommendation on signed social rating networks

ACM Trans. Internet Technol.

(2016)

Cited by (24)

Scalability achievements for enumerative biclustering with online partitioning: Case studies involving mixed-attribute datasets
2021, Engineering Applications of Artificial Intelligence
Citation Excerpt :
This concept is appealing in many domains, such as natural sciences (Madeira and Oliveira, 2004; Xie et al., 2019) and market basket analysis (Luna et al., 2019), which are the most popular domains of application in the biclustering and FPM literature, respectively. However, to exemplify the wide range of applications, we can also mention relevant contributions to recommender systems (Jiang et al., 2019; Kant and Mahara, 2018; Yoldar and Özcan, 2019), educational data mining (Henriques et al., 2019), finding the minimal subbases of finite topological spaces (Chen et al., 2020), explainable knowledge discovery (Couceiro and Napoli, 2019), emerging topic detection in twitter stream (Choi and Park, 2019), analysis of mobile data (Kong et al., 2019), and detection of counterfeit products (Benatia et al., 2020). Biclustering involves hard combinatorial optimization.
Biclustering is a powerful data analysis technique and its concept is appealing in many domains, such as natural sciences and market basket analysis. To exemplify the wide range of biclustering applications, we can also mention recommender systems, educational data mining, emerging topic detection and counterfeit product detection. In this paper, we further extend RIn-Close_CVC, a biclustering algorithm capable of performing, in numerical datasets, an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns. By avoiding a priori partitioning and itemization of the dataset, RIn-Close_CVC implements an online partitioning, which is demonstrated here to guide to more informative biclustering results. The improved algorithm, called RIn-Close_CVC3, is characterized by: a drastic reduction in memory usage; a consistent gain in runtime; additional ability to handle datasets with missing values; and new skills to operate with attributes characterized by distinct distributions or even mixed data types. Moreover, RIn-Close_CVC3 keeps those four attractive properties of RIn-Close_CVC, as formally proved here. The experimental results include synthetic and real-world datasets used to perform scalability and sensitivity analyses, besides a comparative inquiry involving a priori and online partitioning. As a practical case study, a parsimonious set of relevant and interpretable mixed-attribute-type rules is obtained in the context of supervised descriptive pattern mining.
Magnetic optimization algorithm for data clustering
2018, Pattern Recognition Letters
In this paper, a new clustering algorithm inspired by magnetic force is proposed. This algorithm is not sensitive to the initialization problem of cluster centroids. Centroid particles change their position according to the total magnetic force applied by data points. The position of the particle gets updated by employing magnetic resultant force to find the best position of centroid particle for clustering. To evaluate the performance of the proposed algorithm, numerical experiments are conducted on eleven benchmark data sets taken from UCI repository and are compared with five different clustering algorithms. The results show that the proposed algorithms are more accurate, efficient and robust as compared to the other clustering algorithms.
LeaderRank based k-means clustering initialization method for collaborative filtering
2018, Computers and Electrical Engineering
Citation Excerpt :
On the other hand, the Collaborative Filtering recommends items based on the analysis of the ratings given by the users that are usually represented in the form of user-item rating matrix. CF is one of the most used approaches for RS and is further categorized as Memory-based and Model-based methods [1]. Memory-based algorithms are motivated by the phenomenon that people generally believe in recommendations from other people who have similar preferences.
Collaborative filtering based Recommender System is one of the most common technique used for personalized product ranking. It aids the consumer in decision-making process. It helps to choose a product according to the consumer's preference from a large pool of choices.Despite its success, collaborative filtering suffers from the sparsity problem which limits the quality of recommendations. In this paper, we investigate the application of clustering collaborative framework. A unique centroid selection approach for k-means clustering algorithm is proposed that aims to improve clustering quality. The results on three benchmark datasets depict the improvement in the quality of recommendations made.
Intelligent computational techniques
2018, Journal of Computational Science
Collaborative targeting: Biclustering-based online ad recommendation
2019, Electronic Commerce Research and Applications
Citation Excerpt :
Zhang et al. (2014) proposed a biclustering and fusion-based recommendation technique for the cold start problem, and then compared their methods to the UBCF and IBCF algorithms. In a similar study, Kant and Mahara (2018) fused IBCF and UBCF in a weighted sum approach and used biclustering for the neighborhood formation to handle the sparsity problem, which led them to obtain improved prediction results. The aggregation of preferences, criteria, or similarities takes place in various stages in recommender systems.
In online advertising, it is essential to show appropriate ads to target users. However, this is a challenging process. Although conventional targeting methods yield successful results, they cannot effectively select different ads for all users. In this study, we explore collaborative filtering techniques on an online ad dataset. We propose a method of recommending different and effective ads to users. The proposed method, which is based on biclustering and ordered weighted average aggregation operators, can address situations such as the lack of implicit feedback on items. We present the results of an offline analysis of the proposed method together with those of collaborative filtering methods. It is shown that collaborative filtering methods are beneficial, and that the proposed method provides superior results, especially in systems where user navigation histories are well known.
Selection of clusters based on internal indices in multi-clustering collaborative filtering recommender system
2024, International Journal of Electronics and Telecommunications

View all citing articles on Scopus

Tripti Mahara received the B. E degree in Computer Science & Engineering from Sardar Patel University in 1999, M.Tech and Ph.D. from Industrial and Management Engineering, I.I.T Kanpur in 2004 and 2009, respectively. Currently, she is Assistant Professor with the Department of Polymer and Process Engineering IIT Roorkee, India.Her research interests include ERP and Data Mining.

View full text

Nearest biclusters collaborative filtering framework with fusion

Highlights

Abstract

Introduction

Section snippets

Background work

Proposed methodology: nearest biclusters collaborative filtering with fusion (NBCFu)

Data sets

Conclusion

J. Comput. Sci.

Phys. A Stat. Mech. Appl.

Expert Syst. Appl.

Phys. A Stat. Mech. Appl.

Expert Syst. Appl.

Knowl.-Based Syst.

Knowl.-Based Syst.

Knowl.-Based Syst.

Inf. Sci. (NY)

A recommender system based on genetic algorithm for music data in 2010

2nd International Conference on Computer Engineering and Technology

A collaborative recommender system enhanced with particle swarm optimization technique

Multimed. Tools Appl.

Extraction of emotions from multilingual text using intelligent text processing and computational linguistics

J. Comput. Sci.

ClustKNN: a highly scalable hybrid model- & memory-based CF algorithm

Search

Collaborative filtering beyond the user-Item matrix

ACM Comput. Surv.

Typicality-based collaborative filtering recommendation

IEEE Trans. Knowl. Data Eng.

Recommender systems for large-scale E-Commerce: scalable neighborhood formation using clustering

Communications

Item-based collaborative filtering recommendation algorithms

Proc. 10th …

An improved K means clustering with Atkinson index to classify liver patient dataset

Int. J. Syst. Assur. Eng. Manage.

Model-Based collaborative personalized recommendation on signed social rating networks

ACM Trans. Internet Technol.