Keywords

1 Introduction

With the rapid growth of public cloud offerings, service-oriented systems are becoming large-scale and complex. It becomes more difficult for users to find the appropriate services. Recommendation has been a hot research topic with the rapid growth of information. There have been many efforts done both in industry and academia on developing effective recommender systems [1, 2]. Collaborative filtering has been proved to be one of the most successful recommendation methods to deal with information overload in real world [3]. However, most of existing CF recommendation models provide recommendations based on rating models directly without considering context information, such as temporal influence. Actually, because of business competition and temporal dynamics of cloud environment, most cloud services are time-sensitive.

Recently, there have been some studies [4, 5] that investigate the importance of temporal influence in recommender systems, which mainly focus on the temporal dynamics on user preferences. Nevertheless, few works pay attention to temporal influence to Quality of Service (QoS). Compared with traditional internet services, QoS of cloud services is more sensitive to time due to the dynamics of cloud environment. The correlation among cloud services and user behaviors over long time may be weakened because of temporal dynamics. Thus it is important to consider the temporal impact on QoS of cloud services. Besides, the data sparsity problem is always a serious shortcoming to affect the performance of recommendation methods. Due to the sparsity problem, existing similarity models in collaborative methods fail to capture the similarity relationships between users or services effectively.

Based on these observations, in this paper, a temporal-sparsity aware service recommendation method via hybrid CF techniques is proposed. Firstly, temporal influence is considered into neighborhood-based CF recommendation model by distinguishing temporal QoS metrics from the stable QoS metrics. To address the data sparsity problem and mine the similarity relationships between services over time adaptively, a time-aware latent factor model based on tensor decomposition is presented. Finally, experiments based on real-world service dataset are designed and conducted to validate the effectiveness of our proposal.

2 Problem Formulation

In this section, some important concepts and definitions are presented. Firstly, to mining the temporal features of QoS, the QoS metrics of services are divided into two parts, i.e., stable QoS metrics and temporal QoS metrics:

Stable QoS Metrics and Temporal QoS Metrics:

Stable QoS metrics are the metrics evolving at a rather slow speed, which are regular features (such as availability). Temporal QoS metrics are the metrics that have clear temporal features and present dynamic trends over time, which are dynamic features (such as price and duration).

To analyze the temporal influence to recommendation performance, the history time period of rating dataset is divided into K time slots, i.e., \( \{ T_{1} ,\,T_{2} ,\, \ldots .,\,T_{K} \} \). The temporal QoS metrics are distinguished from stable QoS metrics by measuring the fluctuation of the rating for each individual QoS metric over time. The fluctuation of QoS metric \( q_{h} \) is measured by the variance of QoS rating for \( q_{h} \) during the K time slots, which is defined in Eq. (1):

$$ F(q_{h} ) = \frac{1}{K \cdot \left| S \right|}\sum\limits_{s = 1}^{\left| S \right|} {\sum\limits_{k = 1}^{K} {(\overline{rq}_{hk}^{s} - \overline{rq}_{h}^{s} )^{2} } } $$
(1)

Where \( \left| S \right| \) is the number of services in service set S, \( \overline{rq}_{hk}^{s} \) is the average rating of metric \( q_{h} \) in time slot Tk of service s, and \( \overline{rq}_{h}^{s} \) is the overall average rating of \( q_{h} \) in total time period of service s.

A fluctuation threshold \( \delta_{t} \) is given to decide whether a QoS metric is stable or temporal. If \( F(q_{h} ) > \delta_{t} \), then \( q_{h} \) can be seen as a temporal QoS metric, otherwise \( q_{h} \) is considered as a stable metric. So the QoS metric vector \( \varvec{Q} = [q_{1} ,q_{2} ,\, \ldots ,q_{H} ] \) can be divided into two parts: stable QoS vector \( \varvec{SQ} = [sq_{1} ,sq_{2} ,\, \ldots ,sq_{a} ]{\kern 1pt} {\kern 1pt} \) and temporal QoS vector \( \varvec{TQ} = [tq_{1} ,tq_{2} ,\, \ldots ,tq_{b} ]{\kern 1pt} {\kern 1pt} \) (a + b = H).

Global Nearest Neighbors:

For a candidate service s, its global nearest neighbors are the services that have the most similar QoS ratings with service s at the whole time period of the history records.

Temporal Nearest Neighbors:

The temporal nearest neighbors of service s are the services that have the most similar QoS ratings with service s at time slot \( T_{k} \) and its similar time slots, i.e., \( T_{sim} (k) \). (The definition of \( T_{sim} (k) \) is given in Sect. 3.1).

3 Temporal-Sparsity Aware Service Recommendation Method

In this paper, to make more appropriate recommendations from time-sensitive cloud services, a temporal-sparsity aware service recommendation method based on hybrid CF techniques is proposed. Our method can be designed as a three-phase process: (1) Time slot aggregation, (2) Similarity calculation and similarity prediction, (3) Rating prediction, which are described in detail in the following.

3.1 Time Slot Aggregation

To make more accurate recommendations, we make predictions based on the QoS ratings at the target time slot. However, the related rating dataset of the target user at the target time slot maybe very spare. We provide an aggregation strategy to merge the similar time slots for the target time slot.

A temporal similar coefficient \( \varphi (T_{i} ,T_{j} ) \) is defined to measure the temporal closeness of the temporal QoS metrics between time slot Ti and Tj. As shown in Eq. (4), \( \varphi (T_{i} ,T_{j} ) \in [0,{\kern 1pt} \,1] \) is defined based on Pearson Correlation Coefficient (PCC). The larger \( \varphi (T_{i} ,T_{j} ) \) is, the closer the temporal features of candidate services between Ti and Tj is. Given a threshold \( \theta \) to determine whether two time slots are similar. If \( \varphi (T_{i} ,T_{j} ) \ge \theta \), then time slots Ti and Tj can be considered to be similar. The similar time slots of Ti is defined as \( T_{sim} (k) \) and \( T_{k} \subset T_{sim} (k) \).

$$ \varphi (T_{i} ,T_{j} ) = \frac{{\sum\nolimits_{{s \in S(T_{i} ) \cap S(T_{j} )}} {(\overline{{\varvec{RT}}}_{is} - \overline{{\varvec{RT}}}_{i} ) \bullet (\overline{{\varvec{RT}}}_{js} } - \overline{{\varvec{RT}}}_{j} )}}{{\sqrt {\sum\nolimits_{{s \in S(T_{i} ) \cap S(T_{j} )}} {\left\| {\overline{{\varvec{RT}}}_{is} - \overline{{\varvec{RT}}}_{i} } \right\|^{2} } } \cdot \sqrt {\sum\nolimits_{{s \in S(T_{i} ) \cap S(T_{j} )}} {\left\| {\overline{{\varvec{RT}}}_{js} - \overline{{\varvec{RT}}}_{j} } \right\|^{2} } } }} $$
(2)

In Eq. (2), \( S(T_{i} ) \cap S(T_{j} ) \) is the set of coinvoked services by users at time slot \( T_{i} \) and \( T_{j} \), \( \overline{{\varvec{RT}}}_{is} \) is the average temporal QoS-rating vector of service s at Ti, \( \overline{{\varvec{RT}}}_{i} \) is the average temporal QoS-rating vector of all candidate services in \( S(T_{i} ) \cap S(T_{j} ) \) at Ti.

3.2 Similarity Calculation and Similarity Prediction

As presented in Sect. 2, there are two kinds of nearest neighbors in our work, i.e., global nearest neighbors (denoted as \( S_{GNN} \)) and temporal nearest neighbors denoted as \( S_{TNN} \)). The global nearest neighbors of a service s can be decided by Eq. (3):

$$ sim_{sv}^{GNN} = \frac{{\sum\nolimits_{u \in U(s) \cap U(v)} {\left( {\varvec{RQ}_{us} - \overline{{\varvec{RQ}}}_{s} } \right) \bullet \left( {\varvec{RQ}_{uv} - \overline{{\varvec{RQ}}}_{v} } \right)} }}{{\sqrt {\sum\nolimits_{u \in U(u) \cap U(v)} {\left\| {\varvec{RQ}_{us} - \overline{{\varvec{RQ}}}_{s} } \right\|^{2} \times \left\| {\varvec{RQ}_{uv} - \overline{{\varvec{RQ}}}_{v} } \right\|^{2} } } }} $$
(3)

where \( U(s) \cap U(v) \) is the set of users that rated both service s and service v in the whole time period. Here, we give a preset similarity threshold \( \delta_{sim} \), then the services that have similarity with service s no less than \( \delta_{sim} \) can be considered as global nearest neighbors of service s.

Similarly, the temporal nearest neighbors of service s can be determined by Eq. (4). The temporal nearest neighbors of service s at time slot Tk are the services that have similarity with service s no less than \( \delta_{sim} \) at time slot \( T_{k} \) and its similar time slots, i.e., \( T_{sim} (k) \).

$$ sim_{sv}^{TNN} (T_{k} ) = \frac{{\sum\nolimits_{{u \in U_{sv} (T_{k} )}} {\left( {\varvec{RQ}_{us} - \overline{{\varvec{RQ}}}_{s} } \right) \bullet \left( {\varvec{RQ}_{uv} - \overline{{\varvec{RQ}}}_{v} } \right)} }}{{\sqrt {\sum\nolimits_{{u \in U_{sv} (T_{k} )}} {\left\| {\varvec{RQ}_{us} - \overline{{\varvec{RQ}}}_{s} } \right\|^{2} \times \left\| {\varvec{RQ}_{uv} - \overline{{\varvec{RQ}}}_{v} } \right\|^{2} } } }} $$
(4)

where \( U_{sv} (T_{k} ) = \{ u |u \in U(s) \cap U(v)\,\& \,t_{us} \in T_{sim} (k)\,\& \,t_{uv} \in T_{sim} (k)\} \), which is the set of users that rated both service s and service v at \( T_{sim} (k) \).

To solve sparsity problem further, a time-aware latent factor model based on CANDECOMP/PARAFAC (CP) decomposition [6] is applied to predict temporal similarity between services. The triadic relations among services, neighbors and time features are formulated as a three-dimensional similarity tensor \( \varvec{Sim} \in \Re^{{{\text{M}} \times {\text{M}} \times K}} \). The element in tensor Sim is denoted as \( sim_{ijk} \), which represents the temporal similarity of service i and service j at time slot \( T_{k} \). The tensor Sim is symmetric as \( sim_{ijk} = sim_{jik} \). Then, based on the CP decomposition model, the tensor \( \varvec{Sim} \in \Re^{{{\text{M}} \times {\text{M}} \times K}} \) can be decomposed into a sum of component rank-one tensors:

$$ \varvec{Sim} \approx \sum\limits_{r = 1}^{R} {s_{r} \circ s_{r} \circ t_{r} } $$
(5)

where R is actually the rank of tensor Sim, which is defined as the smallest number of rank-one tensors. \( s_{r} \) and \( t_{r} \) represent the latent factor vectors associated with service and time, respectively.

Then the temporal similarity can be predicted by Eq. (6). The observed temporal similarity can be broken into two components: biases and service-neighbor-time interaction. The bias component contains the overall average similarity \( \mu \) and time bias \( b_{tk} \). In similarity prediction, we didn’t consider service bias.

$$ \hat{s}im_{ijk} = \mu + b_{tk} + \sum\limits_{r = 1}^{R} {s_{ir} \circ s_{jr} \circ t_{kr} } {\kern 1pt} $$
(6)

To learn the involved parameter \( b_{tk} \) and the involved vectors, i.e., \( s_{ir} \), \( s_{jr} \) and \( t_{kr} \), we minimize the regularized squared error function:

$$ \mathop {min}\limits_{b,\,s,\,t} \sum\limits_{(i,j,k) \in Train} {\left\| {sim_{ijk} - {\kern 1pt} \hat{s}im_{ijk} } \right\|^{2} {\kern 1pt} + \lambda W} $$
(7)

where Train is the set of the (i, j, k) pairs for \( sim_{ijk} \), which is known as the training set. \( sim_{ijk} \) is obtained by Eq. (4). \( W = b_{tk}^{2} + \left\| {s_{ir} } \right\|^{2} + \left\| {s_{jr} } \right\|^{2} + \left\| {t_{kr} } \right\|^{2} \), which is applied to regularize the learned parameters to avoid overfitting and the constant \( \lambda \) controls the extent of regularization. In this paper, we adopt stochastic gradient descent to solve Eq. (7) by looping through all similarity values in the training set.

3.3 Rating Prediction

The prediction of target user i on candidate service j at Tk (denoted as \( r_{ijk} \)) is defined in Eq. (8). The prediction consists of two parts, i.e., prediction based on the global nearest neighbors and prediction based on the temporal nearest neighbors, which are combined by a weight coefficient \( \alpha \). In our proposal, \( \alpha \) is set as \( a/H \), and \( 1 - \alpha = b/H \).

$$ r_{ijk} = \alpha \cdot \left( {\bar{r}_{j} + \frac{{\sum\nolimits_{{s \in S_{GNN} (j)}} {(r_{is} - \bar{r}_{s} ) \cdot sim_{is}^{GNN} } }}{{\sum\nolimits_{{s \in S_{GNN} (j)}} {\left| {sim_{is}^{GNN} } \right|} }}} \right){\kern 1pt} + \left( {1 - \alpha } \right) \cdot \left( {\bar{r}_{j}^{k} + \frac{{\sum\nolimits_{{s' \in S_{TNN} (j,T_{K} )}} {(r_{is'} - \bar{r}_{s'} ) \cdot sim_{is'}^{TNN} (T_{k} )} }}{{\sum\nolimits_{{s' \in S_{TNN} (j,T_{K} )}} {\left| {sim_{is'}^{TNN} (T_{k} )} \right|} }}} \right) $$
(8)

Where \( \bar{r}_{j}^{k} \) is the average rating of service j at the aggregated time slot \( T_{sim} (k) \), \( S_{GNN} (j) \) and \( S_{TNN} (j,T_{k} ) \) respectively represent service j’s global and temporal nearest neighbor set where the services have been used by user i, and \( r_{is} \) represents the rating of user i on service s, \( \bar{r}_{s} \) is the overall average rating of services in \( S_{GNN} (j) \), and \( \bar{r}_{s'} \) is the overall average rating of services in \( S_{TNN} (j,{\kern 1pt} T_{k} ) \).

4 Experiment

4.1 Experimental Setup

Datasets:

We employ a real-world service dataset to simulate the history QoS ratings of services in the cloud market. The dataset is collected from a well-known travel review site (www. tripadvisor.com). In our experiment, we use five-fold cross validation approach, and the dataset is split into 20% test data and 80% train data.

Comparative Approaches:

To evaluate the effectiveness of our proposal, we compare our method with four other approaches: Item-based CF algorithm using PCC (IPCC) [7], regularized Singular Value Decomposition (RSVD) [8], a temporal QoS-aware web service recommendation method via tensor factorization (TWS) [9], and a time-aware hybrid collaborative recommendation method (THC) [10].

Performance Metrics:

Four widely used metrics are applied to evaluate the statistical accuracy of recommendation approaches: mean absolute error (MAE), root-mean-square error (RMSE) [11], precision and recall [12].

4.2 Experimental Result

In our experiment, to evaluate the recommendation accuracy of our proposal, we compare our method (denoted as TSSRec) with four approaches. Figure 1 shows the best prediction performance (performance under the optimal parameter settings) of all methods in MAE and RMSE. We can see that both of MAE and RMSE of TSSRec are better than other four comparative approaches.

Fig. 1.
figure 1

Comparison of prediction performance in MAE and RMSE.

Figure 2 shows the performance of Top-N (N = 3, 5, 7) recommendations. Figure 2(a) and (b) present the precision@N and recall@N performance of all methods, respectively. It can be found that TSSRec also outperform other methods in Top-N (N = 3, 5, 7) recommendation accuracy.

Fig. 2.
figure 2

Comparison of Top-N recommendation accuracy. (a) Comparison in precision@N. (b) Comparison in recall@N.

From the experimental results above, we can see that temporal influence is important to service recommendation in dynamic cloud environment. And our method achieves considerable improvement on recommendation accuracy compared to other comparative approaches.

5 Related Work

In the past, there has been some research work which integrates temporal influence into collaborative recommendation methods. Hu et al. [4] integrate temporal information into both similarity measurement and QoS prediction by considering the time gap between recommendation time and the occurring time of previous rating. The research [5] considers time information into recommendations based on probabilistic models. Wang et al. [13] adapt matrix factorization techniques to learn user-group affinity based on two different implicit engagement metrics. Recently, more and more influence factors, such as location influence, social influence, are considered into recommender systems. Lian et al. [14] propose a collaborative location recommendation framework to exploit the relations between users, activities and locations. The literature [15] focuses on the problem of joint modeling user check-in behaviors for real-time POI recommendation. Wang et al. [16] propose a spatial-temporal QoS prediction method where the temporal QoS prediction is formulated as a generic regression problem and a zero-mean Laplace prior distribution assumption is made on the residuals of QoS prediction.

6 Conclusion

In this paper, a temporal-sparsity aware service recommendation method based on hybrid CF techniques is proposed. Specifically, temporal influence is considered into classical neighborhood-based CF model by distinguishing temporal QoS metrics from stable QoS metrics. Accordingly, stable nearest neighbors and temporal nearest neighbors are defined. Then a time-aware latent factor model based on CP decomposition is integrated into neighborhood model to mine the temporal similarity relationships between services to address the data sparsity problem. Finally, experiments based on real-world service dataset are conducted to demonstrate the effectiveness of our proposal. In our future work, we will do further research in collaborative recommendation models based on multi-model integration and consider more context information in dynamic cloud environment.