Kernel meets recommender systems: A multi-kernel interpolation for matrix completion

doi:10.1016/j.eswa.2020.114436

Expert Systems with Applications

Volume 168, 15 April 2021, 114436

https://doi.org/10.1016/j.eswa.2020.114436 Get rights and content

Highlights

•
Propose a kernelized matrix completion framework via multi-kernel interpolation.
•
Learn an effective low-dimensional representation in an infinite Hilbert space.
•
Provide a feasible solution to make the raw input data linearly separable.
•
Present an auto-weighted method for multi-kernel representation and fusion.

Abstract

A primary research direction for recommender systems is matrix completion, which attempts to recover the missing values in a user–item rating matrix. There are numerous approaches for rating tasks, which are mainly classified into latent factor models and neighborhood-based models. Most neighborhood-based models seek similar neighbors by computing similarities in the original data space for final predictions. In this paper, we propose a new neighborhood-based interpolation model with a kernelized matrix completion framework, with the impact weights provided by neighbors computed in a new Hilbert space containing more features. In our model, the kernel function is combined with a similarity measurement to achieve better approximation for unknown ratings. Furthermore, we extend our model with a non-linear multi-kernel framework which learns weights automatically to improve the model. Finally, we conduct extensive experiments on several real-world datasets. The outcomes show that the proposed methods work effectively and improve the performance of the rating prediction task compared to both the traditional and state-of-the-art approaches.

Introduction

Recommender systems are widely employed in various spheres and have become a popular research topic in decades (Hwang et al., 2016, Qian et al., 2019, Wang, Zhou and Lu, 2019). For instance, the famous media-service provider Netflix held the Netflix Prize competition to explore algorithms to predict user ratings of movies. This task is also considered as matrix completion issue that retrieves missing ratings in a rating matrix. Table 1 provides a simple example of the user–item rating matrix waiting to be completed. Most of the ratings in this table are missing. In real-world datasets, the rating matrices are even more sparse, which leads to the cold-start problem in recommender systems. The primary target of matrix completion is to retrieve the missing ratings in the user–item rating matrix. A classical solution for matrix completion is nonnegative matrix factorization (NMF) (Lee & Seung, 1999), which factorizes the incomplete matrix into two low-rank matrices. A variety of methods have been proposed for matrix completion. Kang et al. (2016) completed the rating matrix based on low-rank assumption, which adopted a nonconvex rank relaxation to achieve a better rank approximation. Xue et al. (2017) leveraged two parallel deep neural networks to factorize a user–item interaction matrix and predict the unknown ratings. Inspired by word embedding models, Liang et al. (2016) jointly factorized the user–item interaction matrix and the item–item co-occurrence matrix with shared item latent factors.

Collaborative filtering (CF) has been widely investigated by many researchers and is a mature application that has been utilized extensively in industry (Chen et al., 2017, Chen et al., 2019, Wang, Zhou, Chen et al., 2019). The algorithm attempts to determine the hidden relationships between users and items in a data-driven method and recommends similar items to users with the same interests. There are two primary types of CF, latent factor models (LFMs) and neighborhood-based models (NBMs). LFMs discover the latent features of users or items and project them into feature vectors that are generally of low-rank. Matrix factorization (MF) is a typical method of LFMs, which factorizes the raw rating matrix into two low-rank matrices known as the user latent matrix and the item latent matrix. The unknown ratings are predicted by the dot product of the corresponding latent vectors. A large number of MF-based models have been proposed in decades. For example, Koren (2008) applied a singular value decomposition (SVD) based model named SVD++ that considered the influence of the neighborhood. Ning and Karypis (2011) presented a sparse linear model (SLIM) that explored an item–item similarity matrix by factorizing the original user–item interaction matrix. Wang et al. (2018) employed a confidence-aware MF framework to optimize both the precision of rating estimation and prediction confidence.

Different from LFMs, NBMs aim to explore similar users or items by computing the similarities among them and make estimations by considering the influence or contribution of each neighbor. A classic algorithm of NBMs is the $k$ -nearest neighbors (KNN) approach (Sarwar et al., 2001). The item-based KNN models calculate the similarities among items and then sort them for top- $k$ recommendations. For example, Park et al. (2015) proposed a KNN-based CF model named reversed CF, which utilized a KNN graph to locate the KNN of the rated items. As for rating tasks, models commonly compute the unknown ratings with a weighted average of other existing ratings. Different neighbors contribute to the final estimations of target ratings based on the similarities between them. Fig. 1 illustrates this schema. By measuring the similarities or distance between the predicting point and existing points, we can select the most relative points to estimate the value of unknown point. Accordingly, a smaller interval between data points should lead to stronger influence, which means that similar points work better on the recovery of unknown data.

Kernel learning is a technique that applies kernel functions to map the raw data into a high-dimensional space without computing the corresponding projection functions. It is best known in support vector machines (SVMs), which make raw linearly inseparable data separable in a high-dimensional space. Among various kernel functions, radial basis function (RBF) kernels such as the Gaussian kernel are the most widely used, which are often leveraged to train RBF networks. RBF kernels have been extensively applied in recommender systems and improve the performance of MF-based approaches (Liu et al., 2016, Pal and Jenamani, 2018, Zhou et al., 2012). Because RBF kernels are able to calculate the similarities among samples, and the performance of NBMs is closely related to the similarity metric, we also consider it as a powerful technique for improving NBMs. Actually, RBF kernels have been applied in many fields like feature selection (Kuo et al., 2013), clustering (Cruz et al., 2016) and image processing (Romani et al., 2019) due to the ability to measure similarities. Nevertheless, to our knowledge, limited studies have been devoted to the application of RBF kernels in NBMs for recommender system databases.

In this paper, we propose a new kernel-based matrix completion (KMC) framework for recommender systems, which aims to solve the rating tasks with NBMs for a user–item interaction matrix. The model applies RBF kernels that are reformulated by similarity measures and provides estimation for a user on a specific item. Inspired by the interpolation condition, the proposed KMC is a closed-form solution calculated by kernel matrices. This speeds up the rating predictions for a specific user or item. Moreover, we improve this model with a multi-kernel framework for KMC (M-KMC) to merge different features in different latent spaces generated by diverse kernels. Different from extensively used linear combination of kernels, M-KMC applied a non-linear auto-weighted strategy to merge different kernels. In summary, our contributions are as follows:

1.
We propose a kernelized model with a closed-form solution for matrix completion, which applies the interpolation method for rating prediction.
2.
In our proposed model, the similarity metric is combined with the Gaussian kernel to compute the weights of neighbors, which generates a more precise approximation for unknown ratings.
3.
M-KMC is presented with the multi-kernel framework, which adaptively adjusts the weights of the multiple kernel functions and improves the performance of KMC.
4.
We conduct rich experiments on KMC and M-KMC and discuss the effect of different parameters. Our model achieves the performance that is competitive with or superior to the traditional and state-of-the-art models.

Section snippets

Neighborhood-based models

NBMs are commonly used techniques in recommender systems. These models compute the similarities or correlations among different users or items, based on rating records or extracted latent features. A common metric is known as the cosine similarity, which calculates the cosine value between two vectors. For a user-based similarity measure, assume that $I_{u v} = {1, \dots, n}$ is the item set that both user $u$ and user $v$ have co-rated, then vector $Y_{u} = {y_{u 1}, \dots, y_{u n}}$ and vector $Y_{v} = {y_{v 1}, \dots, y_{v n}}$ are the rating vectors

Our proposed models

Before the description of our proposed methods, we first provide explanations for primary mathematical notations used in this section. The set $R^{m \times n}$ is the space of $m \times n$ dimensional real matrix. Assume there are $m$ users and $n$ items in the dataset, all observed user–item rating pairs are stored in set $Ω = {(u, i) | y_{u i} i s o b s e r v e d}$ , and set $\bar{Ω}$ represents the pairs where ratings are missing. Then $Y_{u i} = \{\begin{matrix} y_{u i} & (u, i) \in Ω \\ n u l l & (u, i) \in \bar{Ω} . \end{matrix}$ denotes the user–item interaction matrix. The similarity between two data points is

Experiments and analysis

In this section, we conduct several experiments for our proposed KMC and M-KMC on real-world datasets from different recommendation environments. The performances of different parameter settings are compared to analyze the parameter sensitivity. Finally, we compare our proposed models with both the traditional and state-of-the-art methods via the same metrics to prove the feasibility of our models.

Conclusion

In this paper, we proposed a kernel-based framework KMC for neighborhood-based recommender systems, which aimed to retrieve missing ratings in the user–item interaction matrix. The model projected the original data into a new Hilbert space with RBF kernels and realized the local matrix approximation. Associated with the interpolation condition, the weights of different neighbors were computed by kernel matrices, so that the final estimations were conducted by the dot product of weight vectors

CRediT authorship contribution statement

Zhaoliang Chen: Conceptualization, Formal analysis, Methodology, Writing - original draft. Wei Zhao: Conceptualization, Formal analysis, Methodology, Writing - revision. Shiping Wang: Funding acquisition, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (Nos. U1705262 and 61672159), the Technology Innovation Platform Project of Fujian Province, China under Grant (Nos. 2014H2005 and 2009J1007), the Fujian Collaborative Innovation Center for Big Data Application in Governments, the Fujian Engineering Research Center of Big Data Analysis and Processing.

References (33)

FanJ. et al.
Non-linear matrix completion
Pattern Recognition
(2018)
HwangW. et al.
Efficient recommendation methods using category experts for a large dataset
Information Fusion
(2016)
ParkY. et al.
Reversed cf: A fast collaborative filtering algorithm using a k-nearest neighbor graph
Expert Systems with Applications
(2015)
QianY. et al.
EARS: emotion-aware recommender system based on hybrid information fusion
Information Fusion
(2019)
RomaniL. et al.
Edge detection methods based on RBF interpolation
Journal of Computational and Applied Mathematics
(2019)
YuanW. et al.
Graph kernel based link prediction for signed social networks
Information Fusion
(2019)
ChenJ. et al.
Model-free nonconvex matrix completion: Local minima analysis and applications in memory-efficient kernel PCA
Journal of Machine Learning Research
(2019)
Chen, C., Li, D., Lv, Q., Yan, J., Shang, L., & Chu, S. M. (2017). GLOMA: Embedding global information in local matrix...
Chen, J., Lian, D., & Zheng, K. (2019). Improving one-class collaborative filtering via ranking-based implicit...
CruzD.P.F. et al.
Beerbf: a bee-inspired data clustering approach to design rbf neural network classifiers
Neurocomputing
(2016)

FanJ. et al.

Polynomial matrix completion for missing data imputation and transductive learning

Kang, Z., Peng, C., & Cheng, Q. (2016). Top-n recommender system via matrix completion. In Proceedings of the 30th AAAI...

Koren, Y. (2008). Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of...

Kuchaiev, O., & Ginsburg, B. (2018). Training deep autoencoders for recommender systems. In Proceedings of the 6th...

KuoB.-C. et al.

A kernel-based feature selection method for svm with rbf kernel for hyperspectral image classification

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

(2013)

Lee, J., Kim, S., Lebanon, G., & Singer, Y. (2013). Local low-rank matrix approximation. In Proceedings of the 30th...

Cited by (12)

GAF-Net: Graph attention fusion network for multi-view semi-supervised classification
2024, Expert Systems with Applications
Multi-view semi-supervised classification is a typical task to classify data using a small amount of supervised information, which has attracted a lot of attention from researchers in recent years. In practice, existing methods tend to focus on extracting spatial or spectral features using graph neural networks without considering the diversity and variability of graph structures and the contributions of different views. To address this challenge, a framework termed graph attention fusion network is proposed, which consists of two phases: view-specific feature embedding and graph embedding fusion. In the former feature extraction stage, the view-specific feature embedding module can flexibly focus on the neighborhood calculation operation to learn a weight for each neighboring node. In the latter feature fusion stage, the graph embedding fusion module is performed by complementarity and consistency to fuse these embeddings for semi-supervised classification tasks. We carry out comprehensive experiments in semi-supervised classification on real-world datasets to substantiate the effectiveness of the proposed approach compared to several existing state-of-the-art methods.
Static and incremental robust kernel factorization embedding graph regularization supporting ill-conditioned industrial data recovery
2023, Expert Systems with Applications
Citation Excerpt :
Therefore, completing missing data has become an urgent problem to be solved (Zhao et al., 2022). To exploit the high-rank nonlinear completion issues, some research work tried to estimate missing data from multi-subspaces or nonlinear models based on kernel space (Chen, Zhao et al., 2021) and deep learning (Zhang, Zuo et al., 2018). For instance, Fan and Chow (2017) used an auto-encoder framework of partially observed data to learn and construct a nonlinear potential variable estimation model with missing data.
Low-rank approximation algorithms aim to utilize convex nuclear norm constraint of linear matrices to recover ill-conditioned entries caused by multi-sampling rates, sensor drop-out. However, these existing algorithms are often limited in solving high-dimensionality and rank minimization relaxation. In this paper, a robust kernel factorization embedding graph regularization method is developed to statically impute missing measurements. Specifically, the implicit high-dimensional feature space of ill-conditioned data is factorized by kernel sparse dictionary. Then, a robust sparse-norm and graph regularization constraints are performed in the objective function to ensure the consistency of the spatial information. For the optimization of the parameters involved in the model, a distributed adaptive proximal Newton gradient descent learning strategy is proposed to accelerate the convergence. Furthermore, considering the dynamic time-series and potentially non-stationary structure of industrial data, we propose extended incremental versions to alleviate the complexity of the overall model computation. Extensive data recovery experiments are conducted on two real industrial processes to evaluate the proposed method in comparison with existing state-of-the-art restorers. The results show that the proposed methods can impute better with different missing rates and have strong competitiveness in practical application.
Diversity embedding deep matrix factorization for multi-view clustering
2022, Information Sciences
Citation Excerpt :
Many studies have verified the effectiveness of matrix factorization in clustering tasks [1,2]. Matrix factorization compresses the original high-dimensional data by finding a set of basis to improve the performance of many machine learning tasks, such as matrix completion [3,4], recommender systems [5,6], information retrieval [7], community detection [8,9] and image recognition [10]. Nonnegative matrix factorization (NMF) is one of the most widely utilized dimensionality reduction techniques, which decomposes a nonnegative data matrix into two nonnegative matrices.
Multi-view clustering has attracted increasing attention by reason of its ability to leverage the complementarity of multi-view data. Existing multi-view clustering methods have explored nonnegative matrix factorization to decompose a matrix into multiple matrices for feature representations from multi-view data, which are not discriminative enough to deal with the natural data containing complex information. Moreover, most of multi-view clustering methods prioritize the consensus information among multi-view data, leaving a large amount of information redundant and the clustering performance deteriorated. To address these issues, this paper proposes a multi-view clustering framework that adopts a diversity loss for deep matrix factorization and reduces feature redundancy while obtaining more discriminative features. We then bridge the relation between deep auto-encoder and deep matrix factorization to optimize the objective function. This method avoids the challenges in the optimization process. Extensive experiments demonstrate that the proposed method is superior to state-of-the-art methods.
A novel link prediction algorithm based on inductive matrix completion
2022, Expert Systems with Applications
Link prediction refers to predicting the connection probability between two nodes in terms of existing observable network information, such as network structural topology and node properties. Although traditional similarity-based methods are simple and efficient, their generalization performance varies widely in different networks. In this paper, we propose a novel link prediction approach ICP based on inductive matrix completion, which recoveries node connection probability matrix by applying node features to a low-rank matrix. The approach first explores a comprehensive node feature representation by combining different structural topology information with node importance properties via feature construction and selection. The selected node features are then used as the input of a supervised learning task for solving the low-rank matrix. The node connection probability matrix is finally recovered by a bi-linear function, which predicts the connection probability between two nodes with their features and the low-rank matrix. In order to demonstrate the ICP superiority, we took eleven related efforts including two recent methods proposed in 2020 as baseline methods, and it is shown that ICP has stable performance and good universality in twelve different real networks. Compared with the baseline methods, the improvements of ICP in terms of the average AUC results are ranging from 3.81% $\sim$ 12.77% and its AUC performance is improved by 0.08% $\sim$ 3.54% compared with the best baseline method. The limitation of ICP lies in its high computational complexity due to the feature construction, but the complexity can be reduced by replacing complex features with node semantic attributes if there are additional data available. Moreover, it provides a potential link prediction solution for large-scale networks, since inductive matrix completion is a supervised learning task, in which the underlying low-rank matrix can be solved by representative nodes instead of all their nodes.
Matrix completion on learnt graphs: Application to collaborative filtering
2021, Expert Systems with Applications
Citation Excerpt :
One set of benchmarks consists of matrix factorization (GRMF) (Gu et al., 2010) and nuclear norm minimization (GRNNM) (Mongia & Majumdar, 2019) on graphs, where the graphs are fixed. We have also compared with state-of-the-art techniques – deep neural network based recommendation (DNNRec) (Kiran et al., 2020), kernel matrix completion (KMC) (Chen et al., 2021), and graph convolutional neural network (GCNN) (Chen et al., 2020); these studies have been published in the past year. As the evaluation metric, we show results on all the standard ones – mean absolute error (MAE), root mean squared error (RMSE), precision, recall, and F1 score in top 20 recommendations.
There are two broad frameworks for collaborative filtering. Chronologically, the neighborhood based models came first – they were based on linear interpolation where the interpolation weights were proportional to the similarities between users’ and items’. Latent factor models were introduced later; they were based on the underlying assumption that the data matrix is low-rank. Both the approaches essentially completed the partially observed rating matrix. These two approaches were later combined by regularizing the latent factor model with graph Laplacian computed from the similarities defined in the neighborhood based models. However, this graph was computed from partially observed data and may have failed to represent the true similarities. This work addresses this issue; instead of computing the graph and assuming it to be fixed during the matrix completion process, we will alternate between updating (refining) the graph assuming the matrix to be fully observed, and completing the matrix, assuming the graph to be fixed. Benchmarking with fixed graph techniques as well as state-of-the-art collaborative filtering algorithms on popular Movielens datasets have shown that our proposed approach indeed improves over the rest.
KFDBN: Kernelized Finetuned Deep Belief Network for recommendation
2024, Multimedia Tools and Applications

View all citing articles on Scopus

View full text

Kernel meets recommender systems: A multi-kernel interpolation for matrix completion

Highlights

Abstract

Introduction

Section snippets

Neighborhood-based models

Our proposed models

Experiments and analysis

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Pattern Recognition

Information Fusion

Expert Systems with Applications

Information Fusion

Journal of Computational and Applied Mathematics

Information Fusion

Model-free nonconvex matrix completion: Local minima analysis and applications in memory-efficient kernel PCA

Journal of Machine Learning Research

Beerbf: a bee-inspired data clustering approach to design rbf neural network classifiers

Neurocomputing

Polynomial matrix completion for missing data imputation and transductive learning

A kernel-based feature selection method for svm with rbf kernel for hyperspectral image classification

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing