Keywords

1 Introduction

The collaborative filtering strategy has been widely followed in recommendation systems, where users with similar preferences tend to get similar recommendations [13]. User preferences are expressed explicitly in the form of ratings or implicitly in the form of number of views, clicks, purchases, and so on. Representative collaborative filtering strategies are matrix factorization techniques, which factorize the data matrix with user preferences in a single domain (e.g., music or video), to reveal the latent associations between users and items [14]. However, data sparsity and cold-start problems degrade the recommendation accuracy, as there are only a few preferences on which to base the recommendations in a single domain [5, 13].

With the advent of social media platforms and e-commerce systems, such as Amazon and Epinions, users express their preferences in multiple domains. For example, in Amazon users can rate items from different domains, such as books and DVDs, or users express their opinion on different social media platforms, such as Facebook and Twitter. In the effort to overcome the data sparsity and cold-start problems, several cross-domain recommendation strategies have been proposed, which exploit the additional information of user preferences in multiple auxiliary domains to leverage the recommendation accuracy in a target domain [15]. However, generating cross-domain recommendations is a challenging task [5, 23]; for example, if the auxiliary domains are richer than the target domain, algorithms learn how to recommend items in the auxiliary domains and consider the target domain as noise. Moreover, the auxiliary domains might be a potential source of noise, for example, if user preferences differ in the multiple domains, the auxiliary domains introduce noise in the learning of the target domain. Therefore, a pressing challenge resides on how to transfer the knowledge of user preferences from different domains.

In cross-domain recommendation, the auxiliary domains can be categorized based on users and items overlap, that is, full-overlap, and partial or non user/item overlap between the domains [5]. In this study, we focus on partial user overlap between the target and the auxiliary domains, as it reflects on the real-world setting [8]. Relevant methods, such as [8, 20], form user and item clusters to capture the relationships between multiple domains at a cluster level, thus tackling the sparsity problem; and then weigh the cluster-based and user-based preferences to generate the top-N recommendations in the target domain. However, existing cluster-based cross-domain strategies have the following limitations, they form non-adaptive user and item clusters in a common latent space, when computing the cluster-based associations [8]; or they linearly combine the cluster-based and user-based relationships in the target domain [20].

1.1 Contribution

In this study, we overcome the aforementioned limitations in a novel approach for joint cross-domain recommendation based on user adaptive clustering and similarity learning. Our main contribution is summarized as follows, (i) we formulate an objective function to jointly learn the user-based and cluster-based similarities across multiple domains, while adapting the user clusters in each domain at the same time. To account for the fact that the user-based and cluster-based similarities across multiple domains are sparse, we use an \(L_{2,1}\)-norm regularization to force the similarities to be sparse. (ii) We propose an efficient alternating optimization algorithm to minimize the joint objective function, thus computing the user-based similarities across the multiple domains. The user latent factors are weighted based on the calculated user-based similarities, to generate the final top-N recommendations. Our experiments on ten cross-domain recommendation tasks demonstrate the superiority of the proposed approach over competitive cross-domain strategies.

The remainder of the paper is organized as follows, Sect. 2 reviews the related study and in Sect. 3 we formally define the cross-domain recommendation problem. Section 4 formulates the proposed joint objective function, Sect. 5 presents our alternating optimization algorithm, and in Sect. 6 we elaborate on how to generate the top-N cross-domain recommendations. Finally, Sect. 7 presents the experimental results and Sect. 8 concludes the study.

2 Related Work

Cross-domain recommendation algorithms differ in how the knowledge of user preferences from the auxiliary domains is exploited, when generating the recommendations in the target domain [15, 23]. For example, various cross-domain approaches aggregate user preferences into a unified matrix, on which weighted single-domain techniques are applied, such as user-based kNN [2]. The graph-based method of [6] models the similarity relationships as a direct graph and explore all possible paths connecting users or items to capture the cross-domain relationships. Other methods exploit side information when linking multiple domains, on condition that the domains are linked by common knowledge, such as overlap of user/item attributes [4], social tags [1], and semantic networks [12]. However, such side information is not always available [20].

Other cross-domain techniques assume that the auxiliary and target domains are related by means of shared latent features. Representative methods are tri-matrix co-factorization, where user and item latent factors are shared between domains with different user preferences patterns. For example, Pan et al. [22] transform the knowledge of user preferences from different domains with heterogenous forms of user feedback, that is, explicit or implicit feedback, to compute the shared latent features. Hu et al. [10] model a cubic user-item-domain matrix (tensor), and by applying factorization the respective latent space is constructed, based on which the cross-domain recommendations are generated.

More closely related to our approach, cross-domain strategies transfer patterns of user preferences between domains at a cluster level. Li et al. [16] calculate user and item clusters for each domain, and then encode the cluster-based patterns in a shared codebook; finally, the knowledge of user preferences is transferred across domains through the shared codebook. Gao et al. [8] compute the latent factors of user-clusters and item-clusters to construct a common latent space, which represents the preference patterns e.g., rating patterns, of user clusters on the item clusters. Then, the common cluster-based preference pattern that is shared across domains is learnt following a subspace strategy, so as to control the optimal level of sharing among multiple domains. Mirbakhsh and Ling [20] factorize a cluster-based coarse matrix, to capture the shared interests among user and item clusters. By factorizing the coarse matrix the preferences of users on items are computed at a cluster-level. By linearly combining the factorized cluster-based preferences with the individual user preferences, the recommendation accuracy is improved. However, both [8] and [20] use non-adaptive clustering strategies, when computing the cluster-based similarities.

Meanwhile, there are several graph-based algorithms that perform clustering on multiple domains such as the studies reported in [3, 7], However, these techniques focus on grouping instances e.g., users from different domains, and do not generate cross-domain recommendations.

3 Problem Formulation

3.1 Notation

Our notation is presented in Table 1. We assume that we have d different domains, where \(n_p\) and \(m_p\) are the numbers of users and items in the p-th domain, respectively. In matrix \(\mathbf {R}_p\), we store the user preferences on items, in the form of explicit feedback e.g., ratings or in the form of implicit feedback e.g., number of views, clicks, and so on. Based on the matrix \(\mathbf {R}_p\), we capture the user-based similarities in the p-th domain. If users i and j have interacted with at least a common item q, then users i and j are connected. The connections/similarities are stored in an adjacency matrix \(\mathbf {A}_p\), whose ij-th entries are calculated as follows [24]:

(1)

with i, \(j=1,\ldots ,n_p\).

Table 1. Notation.

3.2 Cross-Domain Similarities

In our approach, we consider two types of cross-domain similarities, that is, the cluster-based and the user-based cross-domain similarities, defined as follows:

Definition 1

(Cluster-based cross-domain similarities). For the p-th domain, we consider a cluster assignment matrix \(\mathbf {C}_p\), with \((\mathbf {C}_p)_{ic}\) expressing the probability that user i belongs to cluster c. We define a cluster-based cross domain matrix \(\mathbf {Y}_{pk} \mathfrak {R}^{c_p \times c_k}\), where \(c_p\) and \(c_k\) are the numbers of user clusters in the p-th and k-th domains respectively. The entry \((\mathbf {Y}_{pk})_{ij}\) expresses the similarity between clusters i and j in the p-th and k-th domains, accordingly.

Definition 2

(User-based cross-domain similarities). We define a cross-domain matrix \(\mathbf {X}_{pk}\) between users in domains p and k. The entry \((\mathbf {X}_{pk})_{ij}\) expresses the similarity between users i and j in domains p and k, respectively.

3.3 Problem Definition

In the cross-domain recommendation task, we assume that we have a target domain p and \(d-1\) auxiliary domains. The goal is to predict the missing user preferences on items (recommendations) in the target domain p, while considering the user-based similarities in the rest of \(d-1\) auxiliary domains. Following the notation of matrix factorization, let \(\mathbf {U}_p \in \mathfrak {R}^{n_p \times l}\) and \(\mathbf {V}_p \in \mathfrak {R}^{m_p \times l}\) be the user and item latent factor matrices, with the factorized matrix \(\mathbf {\hat{R}}_p=\mathbf {U}_p\mathbf {V}_p^T \in \mathfrak {R}^{n_p \times m_p}\) containing the missing user preferences on items. As the i-th row of matrix \(\mathbf {U}_p\), denoted by \((\mathbf {U}_p)_{i*}\), contains the l-dimensional user latent factor of user i, we can use a social regularization term \(\varOmega (\mathbf {U}_p)\), when learning the factorized matrix \(\mathbf {\hat{R}}_p\) as follows [19]:

$$\begin{aligned} \min _{\mathbf {U}_p,\mathbf {V}_p}||\mathbf {R}_p-\mathbf {U}_p\mathbf {V}_p^T||_F^2+\gamma (||\mathbf {U}_p||_F^2+||\mathbf {V}_p||_F^2) + \varOmega (\mathbf {U}_p) \end{aligned}$$
(2)

where the first term is the approximation error between the factorized matrix \(\mathbf {\hat{R}}_p\) and the initial data matrix \(\mathbf {R}_p\); the second one is the regularization term to avoid model overfitting with the parameter \(\gamma >0\); and the third term corresponds to the social regularization term based on the \(d-1\) auxiliary domains between the partial user overlaps. In the social regularization term \(\varOmega (\mathbf {U}_p)\), we have to weigh the influence of the user latent factors based on the user-based cross-domain similarities in \(\mathbf {X}_{pk}\) (Definition 2) as follows:

$$\begin{aligned} \varOmega (\mathbf {U}_p) = \sum \limits _{ij}^{n_p} \frac{1}{d-1}\sum \limits _{k=1}^{d-1} (\mathbf {X}_{pk})_{ij}||(\mathbf {U}_p)_{i*}-(\mathbf {U}_p)_{j*}||^2, \quad \text {with } p\ne k \end{aligned}$$
(3)

The term in the sum expresses the approximation error between the user latent factors, weighted by the user-based cross-domain similarities in \(\mathbf {X}_{pk}\). The goal of the proposed approach is formally defined as follows:

Definition 3

(Problem). The goal of the proposed approach is to calculate the weights in \(\mathbf {X}_{pk}\) based on the preferences that users have in the different domains, in order to weigh the approximation error between the user latent factors \((\mathbf {U}_p)_{i*}\) and \((\mathbf {U}_p)_{j*}\) in (3).

4 Joint Cross-Domain Objective Function

User clustering. To simplify the presentation, from now on we assume that we have a target domain p and an auxiliary domain k. Given the adjacency matrix \(\mathbf {A}_p\) (computed in (1)), first we have to define the objective function for performing user clustering on the p-th domain, that is, to calculate the cluster assignment matrix \(\mathbf {C}_p\), which corresponds to the following minimization problem:

(4)

with orthogonality constraints on the cluster matrix \(\mathbf {C}_p\), and the user assignments to clusters being 0 or positive. According to the Laplacian method of [9], the minimization problem of (4) is equivalent to:

$$\begin{aligned} \begin{array}{c} \min \limits _{\mathbf {C}_p}\sum \limits _{ij} ||(\mathbf {C}_p)_{i*}-(\mathbf {C}_p)_{j*}||^2=\min \limits _{\mathbf {C}_p}Tr(\mathbf {C}_p^T\mathbf {L}_p\mathbf {C}_p)\\ \text {subject to } \mathbf {C}_p^T\mathbf {C}_p=\mathbf {I}, \mathbf {C}_p\ge 0 \end{array} \end{aligned}$$
(5)

where \(Tr(\cdot )\) is the trace operator. Matrix \(\mathbf {L}_p \in \mathfrak {R}^{n_p \times n_p}\) is the Laplacian of the adjacency matrix \(\mathbf {A}_p\), which is computed as follows: \(\mathbf {L}_p=\mathbf {D}_p-\mathbf {A}_p\), where \(\mathbf {D}\in \mathfrak {R}^{n_p \times n_p}\) is a diagonal matrix, whose entries are calculated as \((\mathbf {D}_p)_{ii}=\sum \limits _{ij}(\mathbf {A}_p)_{ij}\). Similarly, we define the respective objective function in (5), for performing user clustering on the auxiliary domain k, denoted by matrix \(\mathbf {C}_k \in \mathfrak {R}^{n_k \times c_k}\).

Cluster-based and User-based Similarities. To compute the cluster-based and user-based similarities between domains p and k, we follow a co-clustering strategy [7], where we have to minimize the following objective function:

$$\begin{aligned} \begin{array}{c} \min \limits _{\mathbf {Y}_{pk},\mathbf {X}_{pk}} ||\mathbf {X}_{pk}-\mathbf {C}_p\mathbf {Y}_{pk}\mathbf {C}_k^T||_F^2 + \lambda _{x}||\mathbf {X}_{pk}||_{2,1}+\lambda _{y}||\mathbf {Y}_{pk}||_{2,1}\\ \\ \text {subject to } \mathbf {Y}_{pk}^T\mathbf {Y}_{pk}=\mathbf {I}, \mathbf {Y}_{pk}\ge 0, \mathbf {X}_{pk}\ge 0 \end{array} \end{aligned}$$
(6)

with orthogonality constraints on the cluster-based matrix \(\mathbf {Y}_{pk}\), and the user-based and cluster-based (cross-domain) similarities being 0 or positive. Symbol \(||\cdot ||_{2,1}\) denotes the \(L_{2,1}\) norm of a matrix which is calculated as follows [21]:

$$\begin{aligned} ||\mathbf {X}_{pk}||_{2,1}=\sum \limits _{i=1}^{n_p}\sqrt{\sum \limits _{j=1}^{n_k}{(\mathbf {X}_{pk})_{ij}}^2}=\sum \limits _{i=1}^{n_p}||(\mathbf {X}_{pk})_{i*}||_2 \end{aligned}$$
(7)

The \(L_{2,1}\) regularization terms in (6) force the solutions of matrices \(\mathbf {X}_{pk}\) and \(\mathbf {Y}_{pk}\) to be sparse, reflecting on the real-world scenario, where the user-based and cluster-based cross-domain similarities are usually sparse [5]. Parameters \(\lambda _x\),\(\lambda _y>0\) control the respective \(L_{2,1}\) regularization terms in (6).

Joint Objective Function. By combining (i) the objective function in (6) with (ii) the two clustering objective functions in (5) for domains p and k, we have to minimize the following joint objective function:

(8)

where \(\beta _p,\beta _k>\) 0 are the regularization parameters for the clusterings in domains p and k, respectively.

5 Alternating Optimization

As the joint objective function \(\mathcal {F}(\mathbf {C}_k,\mathbf {C}_p,\mathbf {Y}_{pk},\mathbf {X}_{pk})\) in (8) is not convex with respect to the four variables/matrices, we propose an alternating optimization algorithm, where we update one variable, while keeping the remaining three fixed. The cluster assignment matrices \(\mathbf {C}_k,\mathbf {C}_p\) are initialized by performing k-means to the respective adjacency matrices \(\mathbf {A}_k\) and \(\mathbf {A}_p\), while \(\mathbf {Y}_{pk}\) and \(\mathbf {X}_{pk}\) are initialized by random (sparse) matrices. Next, we present the updating steps for each variable.

Step 1, fix \(\mathbf {C}_p\), \(\mathbf {Y}_{pk}\), \(\mathbf {X}_{pk}\) and update \(\mathbf {C}_k\) . By considering the optimality condition \(\partial {F}/\partial {\mathbf {C}_k}=0\), we calculate the partial derivative of \(\mathcal {F}\) with respect to \(\mathbf {C}_k\):

$$\begin{aligned} \frac{\partial \mathcal {F}}{\partial \mathbf {C}_k}=-2\mathbf {X}_{pk}^T \mathbf {C}_p \mathbf {Y}_{pk} + 2\mathbf {C}_k \mathbf {Y}_{pk}^T \mathbf {C}_p^T \mathbf {C}_p \mathbf {Y}_{pk} + 2\beta _k \mathbf {L}_k \mathbf {C}_k \end{aligned}$$
(9)

As the joint objective function \(\mathcal {F}\) in (8) is subject to the orthogonality constraints \(\mathbf {C}_p^T\mathbf {C}_p=\mathbf {I}\), \(\mathbf {Y}_{pk}^T\mathbf {Y}_{pk}=\mathbf {I}\), the second term of (9) equals \(2\mathbf {C}_k\). By setting the partial derivative equal to zero we have to solve the following equation with respect to \(\mathbf {C}_k\):

$$\begin{aligned} -\mathbf {X}_{pk}^T \mathbf {C}_p \mathbf {Y}_{pk} + \mathbf {C}_k + \beta _k \mathbf {L}_k \mathbf {C}_k = 0 \end{aligned}$$
(10)

As \((\mathbf {I}+\beta _k \mathbf {L}_k)\) is a positive definite matrix, we can obtain the following closed-form solution (updating rule) of \(\mathbf {C}_k\):

$$\begin{aligned} \mathbf {C}_k =(\mathbf {I}+\beta _k \mathbf {L}_k)^{-1}\mathbf {X}_{pk}^T \mathbf {C}_p \mathbf {Y}_{pk} \end{aligned}$$
(11)

Step 2, fix \(\mathbf {C}_k\), \(\mathbf {Y}_{pk}\), \(\mathbf {X}_{pk}\) and update \(\mathbf {C}_p\) . The partial derivative of \(\mathcal {F}\) with respect to \(\mathbf {C}_p\) is equivalent to:

$$\begin{aligned} \frac{\partial \mathcal {F}}{\partial \mathbf {C}_p}=-2\mathbf {X}_{pk}\mathbf {C}_k \mathbf {Y}_{pk}^T + 2\mathbf {C}_p \mathbf {Y}_{pk} \mathbf {C}_k^T \mathbf {C}_k \mathbf {Y}_{pk}^T + 2\beta _p \mathbf {L}_p \mathbf {C}_p \end{aligned}$$
(12)

Similarly, provided that \(\mathcal {F}\) is subject to \(\mathbf {C}_k^T\mathbf {C}_k=\mathbf {I}\), \(\mathbf {Y}_{pk}^T\mathbf {Y}_{pk}=\mathbf {I}\), we have the optimality condition by setting the partial derivative equal to zero:

$$\begin{aligned} -\mathbf {X}_{pk} \mathbf {C}_k \mathbf {Y}_{pk}^T + \mathbf {C}_p + \beta _p \mathbf {L}_p \mathbf {C}_p =0 \end{aligned}$$
(13)

Given that \((\mathbf {I} + \beta _p \mathbf {L}_p)\) is positive definite, we have the following updating rule for \(\mathbf {C}_p\):

$$\begin{aligned} \mathbf {C}_p = (\mathbf {I} + \beta _p \mathbf {L}_p)^{-1} \mathbf {X}_{pk} \mathbf {C}_k \mathbf {Y}_{pk}^T \end{aligned}$$
(14)

Step 3, fix \(\mathbf {C}_p\), \(\mathbf {C}_k\), \(\mathbf {X}_{pk}\) and update \(\mathbf {Y}_{pk}\) . The presence of the \(L_{2,1}\)-norm regularization in the objective function \(\mathcal {F}\) of (8) makes the model difficult to optimize, as the algorithm cannot be guaranteed to convergence based on the analysis at [21]. To overcome this issue, we define a diagonal matrix \(\mathbf {Q}_{y} \in \mathfrak {R}^{c_p \times c_k}\) (with the same size as \(\mathbf {Y}_{pk}\)), whose entries are calculated as follows:

$$\begin{aligned} (\mathbf {Q}_{y})_{ii}=\frac{1}{2||(\mathbf {Y}_{pk})_{i*}||_2} \end{aligned}$$
(15)

thus, we can calculate the partial derivative of \(\mathcal {F}\) with respect to \(\mathbf {Y}_{pk}\) as follows:

$$\begin{aligned} \frac{\partial \mathcal {F}}{\partial \mathbf {Y}_{pk}}= -2 \mathbf {C}_p^T \mathbf {X}_{pk} \mathbf {C}_k + 2 \mathbf {C}_p^T \mathbf {C}_p \mathbf {Y}_{pk} \mathbf {C}_k^T \mathbf {C}_k + 2 \lambda _y \mathbf {Q}_{y} \mathbf {Y}_{pk} \end{aligned}$$
(16)

where the last term corresponds to the partial derivative of the \(L_{2,1}\) regularization term of \(\mathbf {Y}_{pk}\) in (8), with convergence guarantees [21]. As the joint objective function \(\mathcal {F}\) is subject to \(\mathbf {C}_k^T\mathbf {C}_k=\mathbf {I}\), \(\mathbf {C}_p^T\mathbf {C}_p=\mathbf {I}\), by setting the partial derivative of (16) equal to zero, we have:

$$\begin{aligned} -\mathbf {C}_p^T \mathbf {X}_{pk} \mathbf {C}_k + \mathbf {Y}_{pk} + \lambda _y \mathbf {Q}_y \mathbf {Y}_{pk}=0 \end{aligned}$$
(17)

which results in the following update rule for \(\mathbf {Y}_{pk}\):

$$\begin{aligned} \mathbf {Y}_{pk}=(\mathbf {I}+\lambda _y\mathbf {Q}_y)^{-1} \mathbf {C}_p^T \mathbf {X}_{pk} \mathbf {C}_k \end{aligned}$$
(18)

where \((\mathbf {I}+\lambda _y\mathbf {Q}_y)\) is a positive definite matrix.

Step 4, fix \(\mathbf {C}_k\), \(\mathbf {C}_p\), \(\mathbf {Y}_{pk}\) and update \(\mathbf {X}_{pk}\) . Similarly, given the \(L_{2,1}\) regularization term of \(\mathbf {X}_{pk}\) in the joint objective function \(\mathcal {F}\), we define the diagonal matrix \(\mathbf {Q}_x \in \mathfrak {R}^{n_p \times n_k}\), whose entries are computed as follows:

$$\begin{aligned} (\mathbf {Q}_{x})_{ii}=\frac{1}{2||(\mathbf {X}_{pk})_{i*}||_2} \end{aligned}$$
(19)

Then, we take the partial derivative of \(\mathcal {F}\) with respect to \(\mathbf {X}_{pk}\):

$$\begin{aligned} \frac{\partial \mathcal {F}}{\partial \mathbf {X}_{pk}}=2 \mathbf {X}_{pk}-2\mathbf {C}_p \mathbf {Y}_{pk}\mathbf {C}_k^T + \beta _x \mathbf {Q}_x \mathbf {X}_{pk} \end{aligned}$$
(20)

By setting the partial derivative of (20) equal to zero, we obtain the following update rule for \(\mathbf {X}_{pk}\):

$$\begin{aligned} \mathbf {X}_{pk} = (\mathbf {I}+\beta _x \mathbf {Q}_x)^{-1} \mathbf {C}_p \mathbf {Y}_{pk} \mathbf {C}_k^T \end{aligned}$$
(21)

Analysis. The alternating optimization is performed iteratively, where in each iteration matrices \(\mathbf {C}_k\), \(\mathbf {C}_p\), \(\mathbf {Y}_{pk}\) and \(\mathbf {X}_{pk}\) are updated based on (11), (14), (18) and (21), respectively. More precisely, at each iteration each variable/matrix is recalculated based on the rest three matrices, which means that each matrix is adapted to the values that the rest matrices have taken at the previous iteration, in order to reach a consensus solution for all four matrices over the iterations. The alternating optimization algorithm is repeated, until the algorithm converges. The optimization algorithm converges on condition that the joint objection function \(\mathcal {F}\) in (8) monotonically decreases after each iteration. Based on [21] the \(L_{2,1}\)-norm regularization terms of \(\mathcal {F}\) are differentiable at zero, by using the diagonal matrices \(\mathbf {Q}_y\) and \(\mathbf {Q}_x\) in (15) and (19) when updating \(\mathbf {Y}_{pk}\) and \(\mathbf {X}_{pk}\) in (18) and (21), respectivelyFootnote 1. By considering the optimality condition in each step, that is, setting the partial derivative of \(\mathcal {F}\) with respect to each variable equal to zero when updating the four variables, the proof that the algorithm converges is similar to the convergence analysis of [11].

6 Generating Top-N Recommendations

The joint objective function \(\mathcal {F}\) for \(k=1,\dots ,d-1\) auxiliary domains can be extended to:

(22)

where the variables of \(\mathcal {F}\) are (i) the \(d-1\) clustering matrices \(\mathbf {C}_k\) of the auxiliary domains; (ii) the cluster matrix \(\mathbf {C}_p\) of the target domain p; (iii) the \(d-1\) matrices \(\mathbf {Y}_{pk}\), and \(\mathbf {X}_{pk}\). An overview of our approach is presented in Algorithm 1.

figure a

7 Experimental Evaluation

7.1 Settings

Cross-domain recommendation tasks. Our experiments were performed on ten cross-domain tasks from the Rich Epinions Dataset (RED)Footnote 2, which contains 131,228 users, 317,775 items and 1,127,673 user preferences, in the form of ratings. The items are grouped in categories/domains, and we evaluate the performance of cross-domain recommendation on the 10 largest domains. The evaluation data were provided by the authors of [20]. The main characteristics of the ten cross-domain recommendation tasks are presented in Table 2. The evaluation tasks are challenging, as the domains do not have item overlaps, but only user overlaps. In each cross-domain recommendation task, we consider one target domain and the remaining nine serve as auxiliary domains. For each task we preserve all the ratings of the auxiliary domains, and we randomly select 25 %, 50 % and 75 % of the target domain as training set [20]. For each split, the remaining ratings of the target domain are considered as test set. We repeated our experiment five times, and we report mean values and standard deviations over the runs.

Table 2. The ten cross-domain recommendation tasks.

Compared methods. In our experiments, we evaluate the performance of the following methods:

  • NMF [14]: a single-domain baseline Nonnegative Matrix Factorization method, which generates recommendations based only on the ratings of the target domain, ignoring the ratings in the auxiliary domains.

  • CLFM [8]: a cross-domain Cluster-based Latent Factor Model which uses joint nonnegative tri-factorization to construct a latent space to represent the rating patterns of user clusters on the item clusters from each domain, and then generates the cross-domain recommendations based on a subspace learning strategy.

  • CBMF [20]: a cross-domain Cluster-based Matrix Factorization model, which defines a coarse cross-domain matrix to capture the shared preferences between user and item clusters in the multiple domains, and then reveals the latent associations at a cluster level by factorizing the coarse cross-domain matrix. The final recommendations are generated by linearly combining the cluster-based latent associations and the user-based latent associations in the target domain. CBMF controls the influence of the cluster-based relationships on the personalized recommendations based on a parameter \(\alpha \).

  • JCSL: the proposed Joint cross-domain user Clustering and Similarity Learning model.

In all models we varied the number of latent factors from [10,100] by a step of 10. In the cross-domain methods of CLMF, CBMF and JCSL we fixed the number of clusters to 100, as suggested in [20]. The predefined clusters in both the CLMF and CBMF methods are computed by performing k-means, also used in [8, 20]. Similarly, in the proposed JCSL method, the clusters are initialized by the k-means algorithm (Sect. 5). Following [8], in CLFM we tuned the dimensionality of the subspace up to the minimum number of latent factors of the multiple domains. In CBFM, we varied the \(\alpha \) parameter in [0,1], where lower values of \(\alpha \) consider to a fewer extent the cluster-based relationships, when computing the top-N recommendations. In JCSL the maximum number of iterationsFootnote 3 of the alternating optimization algorithm is fixed to 50, and the regularization parameters of the objective function in (22) were varied in [0.0001,0.1]. In all examined models, the parameters were determined via cross validation and in our experiments we report the best results.

Evaluation protocol. Popular commercial systems make top-N recommendations to users, and relevant studies showed that rating error metrics, such as RMSE (Root Mean Squared Error) and MAE (Mean Absolute Error) do not necessarily reflect on the top-N recommendation performance [5]. Therefore, in our experiments we used the ranking-based metrics Recall and Normalized Discounted Cumulative Gain to evaluate the top-N performance of the examined models directly [20]. Recall (R@N) is defined as the ratio of the relevant items in the top-N ranked list over all the relevant items for each user. The Normalized Discounted Cumulative Gain (NDCG@N) metric considers the ranking of the relevant items in the top-N list. For each user the Discounted Cumulative Gain (DCG@N) is defined as:

$$\begin{aligned} DCG@N = \sum _{j=1}^{N}{ \frac{2^{rel_j}-1}{\log _2{j+1}} } \end{aligned}$$
(23)

where \(rel_j\) represents the relevance score of the item j to the user. NDCG@N is the ratio of DCG@N over the ideal iDCG@N value for each user, that is, the DCG@N value given the ratings in the test set. In our experiments we averaged R@N and NDCG@N over all users.

7.2 Results

In the first set of experiments, we use 75 % of the target domain as training set, while the remaining is considered as test set. Table 3 presents the experimental results in terms of NDCG@10. The cross-domain methods of CLFM, CBMF and JCSL significantly outperform the single-domain NMF method, by exploiting the auxiliary domains when generating the recommendations. This happens because the cross-domain methods incorporate the additional information of user preferences on items from the auxiliary domains, thus reducing the data sparsity in the target domain. The proposed JCSL method achieves an 8.95 % improvement on average when comparing with the second best method. Using the paired t-test we found that JCSL is superior over the rest approaches for \(p<0.05\). JCSL beats the competitive strategies, as it exploits the cluster-based similarities more efficiently than the competitive cluster-based models. The joint learning strategy of the adaptive user clustering while computing the user-based and cluster-based similarities, makes JCSL to efficiently incorporate the additional information of user preferences in the auxiliary domains. On the other hand, CLFM uses a subspace learning strategy on non-adaptive clusters in a common latent space. Finally, CBMF linearly combines the cluster-based and the individual latent associations by capturing the user preferences in the auxiliary domains based on predefined clusters. In this set of experiments there is the exceptional case of the “Baby Care” cross-domain task, where the proposed method has similar performance with CBMF. This happens because “Baby Care” is the less sparse domain, as presented in Table 2. Figure 1 compares the examined models in terms of recall (R@N), by varying the number of the top-N recommendations. Similarly, JCSL achieves a 12.17 % improvement on average, for all the cross-domain recommendation tasks.

Table 3. Effect on NDCG@10 for the ten cross-domain recommendation tasks, using 75% of the target domain as training set. Bold values denote the best scores, for \(^*p<0.05\) in paired t-test. The last column denotes the relative improvement (%), when comparing JCSL with the second best method (CBMF).
Fig. 1.
figure 1

Effect on Recall (R@N) by varying the number of the top-N recommendations. In the ten cross-domain tasks, 75 % of the target domain is considered as training set.

To evaluate the performance of the examined methods when sparsity increases, the training set is reduced to 25 % of the target domain, while keeping all the ratings of the auxiliary domains. Table 4 reports the experimental results in terms of recall (R@10) based on the reduced training sets. In relation to the experimental results of Fig. 1, recall drops for all methods, due to the increased sparsity. As we can observe, the proposed JCSL method achieves relatively high recall. In all cross-domain recommendation tasks, JCSL is superior to the competitive cross-domain strategies (for \(p<0.05\)), by achieving on average relative improvement of 14.49 %, when comparing with the second best method.

Table 4. Effect on Recall (R@10) for the ten cross-domain recommendation tasks, using 25 % of the target domain as training set. Bold values denote the best scores, for \(^*p<0.05\) in paired t-test. The last column denotes the relative improvement (%), when comparing JCSL with the second best method (CBMF).

Figure 2 shows the effect on NDCG@10 of the cross-domain recommendation models, by varying the training sizes of three representative target domains, which are at different scale (Table 2). For presentation purposes, in this set of experiments the baseline NMF method is omitted, due to its low performance. We observe that all cross-domain methods increase the NDCG metric, when a larger training set is used. Figure 2 shows that the proposed JCSL method keeps NDCG relatively high in all settings, while outperforming CLFM and CBMF. The three cross-domain models differ in how the knowledge of user preferences is transferred between the domains when generating the recommendations, which explains their different performance when decreasing the training set size. JCSL adapts the clustering in each domain separately, while computing the cross-domain cluster-based similarities; whereas CLFM and CBMF compute the similarities between predefined/non-adaptive clusters, when generating the recommendations.

Fig. 2.
figure 2

Effect on NDCG@10 by varying the size of the training set.

8 Conclusion

In this paper, we presented an efficient cross-domain recommendation algorithm based on a joint strategy to adapt the user clusters, while calculating the user-based and cluster-based similarities across multiple domains. The joint optimization problem is solved via an efficient alternating optimization algorithm. Our experiments on ten cross-domain tasks confirmed the superiority of the proposed approach over competitive cross-domain strategies at different levels of sparsity. The main advantages of our approach are that JCSL adapts the clusters in each domain separately, while computing the cross-domain cluster-based similarities, whereas the competitors compute the similarities between predefined/non-adaptive clusters when generating the recommendations. Instead of linearly combining the cluster-based and user-based similarities, as for example CBMF does, JCSL jointly learns both types of similarities. In this study we considered partial user overlaps, with the mapping of users being known between the different domains. An interesting future direction is to extend our proposed approach for unknown user-matching across multiple domains [17]. In addition, an interesting future direction is to evaluate the performance of different clustering algorithms, such as spherical k-means [25] or power iteration [18], to initialize the clusters in the different domains.