1 Introduction

If This Then That (IFTTT) is a popular platform that deploys applications for end-users using the Trigger-Action Programming (TAP) approach. Recently, TAP has been widely adopted because it is a simple programming model for the smart home environments. To date, there are several thousand users who have programmed applications for the purpose of automating tasks through different services. The task applications programmed in IFTTT are varied in their domain. The most popular domains are social media and communications. Lately, more services have been introduced that work with Internet of Things (IoT). Over time, new services have been adopted into the platform and the TAP paradigm has been proven to satisfy most end-user necessities in smart home environments [10].

In a growing ecosystem of services and end users, relevant services that help end users in their everyday tasks are required. Current work on recommender systems focuses primarily on recommending general contents such as movies, music, articles, and news pages. In the domain of service composition, there have been efforts to select and recommend services. In the domain of mashups, recommender systems have been developed that assist in the development of mashups. However, to the extent of the authors’ knowledge, there has been no work to date on the recommendation algorithms for service mashups created with the TAP paradigm.

In our work, we address mashups as a recommendation content and specialize our algorithm for TAP mashups. We aim to provide useful trigger-action mashups to support users in their everyday activities. To this end, we analyze the TAP structure and mashups to provide more meaningful recommendations.

This paper describes a model for mashup recommendations tailored to the TAP paradigm. First, we create a strategy to extract a rating for mashups from explicit and implicit features. We then provide an selection of features that are meaningful for the mashup creation process. Afterwards, we present the algorithms for evaluation, followed by the algorithm singular-value decomposition++ (SVD++) with TAP factors. We also present an evaluation of our approach by reporting the results of our experiment applied to a real-world dataset from IFTTT.

2 Related Work

Mashup is a popular technique that is developed to allow end users to make applications in an easy manner by reusing and combining services. It also shortens the application development life cycle.

There are some tools for assisting the creation of mashups, such as MashupAdvisor [2], which is specialized for the mashup design process. It relies on the popularity of mashups and the semantic similarity between recommended goals in the mashup design process. Other results are presented in VisComplete [5], which acts as an auto-complete mechanism for mashing up pipelines, and suggesting modules and connections to generate mashups. They use graph similarity and data mining algorithms to rank predictions of pipeline structures, thus generating mashups. Autocompletion for mashups [4] is a project that developed an auto-completion tool for service mashups by using a recommendation technique. They developed a data model and ranking metrics to give mashup recommendations by using the top k recommendation algorithm. A dataflow-pattern-based recommendation framework for data service mashup [12] proposes a method to analyze the relationships between data services and dataflow patterns to recommend targeted data services. For the purpose of recommendation, they use the gspan method, which is a graph-based substructure pattern mining method. WiSer [1] is a Web API selection framework that exploits multidimensional service descriptions. This framework relies on revising the dimensional attributes in order to conform to developer preferences and constraints.

There have been studies on assisting the creation of mashups. However we have not yet found work on recommending TAP mashups. Specially for their practical reuse in platforms such as IFTTT.

Table 1. Recipe fields in the dataset and their corresponding descriptions.

3 Feature Selection from Trigger-Action Programming

To develop our algorithm, we have used the 200,000 recipe IFTTT dataset provided by Ur et al. [11]. The database contains 224,536 different TAP recipes, reused more than 11 million times, and developed by 106,427 different mashup programmers. Recipes are structured with two main components, which are a trigger and an action. A trigger is a condition provided by a service that lets an action be executed. To better describe the structure of each recipe, it is beneficial to remember that a mashup corresponds to a recipe in the IFTTT ecosystem. Table 1 describes the information about triggers, actions, with an example recipe.

To create a feature set for recommending mashups created using the TAP paradigm, we have analyzed the IFTTT dataset and selected some features relevant to mashup recommendation. Some of our features are explicit features contained in the data, and others have been created as implicit features. We have used them to create a rating strategy, and used some others as implicit feedback features for improving predictions. We use index letters to distinguish between users u, v and items i, j.

In this study, the items for recommendation are mashup entities. A rating \(r_{u,i}\) denotes a rating by user u on item i, whereas \({\hat{r}}_{u,i}\) denotes a predicted rating for user u and item i.

3.1 Rating Extraction

The main issue for recommendation with the IFTTT dataset is that there are no explicit user ratings for each recipe. However, we have created a rating measurement extracted from the dataset.

The first mashup feature we use is the number of shares. The number shares in the IFTTT dataset is calculated, and denoted as \(N_{shares}(i)\). This is a explicit feature expressed as the number of adoptions in the dataset. We regard this as a measurement about adoption, and popularity among users. In order to use the number of shares as a measurement, we applied a logarithmic transformation. Our explicit rating \(r_{u,i}\) is calculated as follows:

$$\begin{aligned} r_{u,i}^{number of shares} = log(N{shares}(i)+1) \end{aligned}$$
(1)

During the analysis of this dataset, we found some recipes that have been recreated multiple times. Among all the possible trigger-action combinations, there are only few unique combinations. We found that only 7% of the dataset contains unique combinations of triggers and actions. The rest of the mashups have been recreated by different users. At the same time, we found that adoption and recreation are different mashup-design behaviors. For this purpose, we define a count function count(i) to calculate the number of times a trigger and an action are recreated in the dataset. We apply a logarithmic transformation to the count. Therefore, we define the rating measurement based on the recreation count as follows:

$$\begin{aligned} r_{u,i}^{recreation} = log(count(i)+1) \end{aligned}$$
(2)

We then integrate both measurements into a single rating measurement (3) by averaging and by applying a minimum and maximum scaling. The scaling, denoted as scale(), is intended to fit the distribution in to a range from 1 to 5.

$$\begin{aligned} r_{u,i} = scale(avg(r_{u,i}^{number of shares}, r_{u,i}^{recreation})) \end{aligned}$$
(3)
Table 2. Features sound in the dataset and their corresponding descriptions.

3.2 Feature Selection

To choose the features that are effective to predict user ratings \({\hat{r}}_{u,i}\), we have explored the IFTTT dataset looking for information that generates information gain [8]. These features will provide us with more information about the users and their items. In particular, they will give us insight into which information about the mashup is relevant for accurate rating predictions. For this purpose, we have also considered implicit features such as triggers with more than one action pairing or actions with more than one trigger pairing. These kinds of binary features may improve the prediction accuracy [9].

Fig. 1.
figure 1

Pearson correlation analysis by rating and feature pair.

We have organized different features according category of the characteristics that they describe. We have organized them by recipe, author, trigger, and action, as shown in Table 2. To select a set of relevant features, we have analyzed our set of variables with a Pearson product-moment correlation coefficient (PPMCC), as shown in Fig. 1. Based on this analysis, we have decided to create a group with the highly correlated features. We also show some binary measurements for variables which are over the average of the population. They are useful to account for implicit information contained in the dataset.

We found that among our rating strategies, shown in Fig. 1, the composed rating has a higher correlation with the recreation count than with number of shares. We observed that the same behavior is expected in the relationship with the other features.

Moreover, we can observe that there is a high correlation among the features “Triggers binded to actions”, with “Action use count”. In the same way “actions binded to trigger” is correlated to “triggers use count”.

4 Recommendation for the TAP Paradigm

One of our aims is to integrate the information on users and items on a single space. For that purpose we have selected singular value decomposition (SVD) as a baseline which predicts a rating \(r_{ui}\) as follows:

$$\begin{aligned} {\hat{r}}_{ui} = \mu + b_{u} + b_{i} + q_{i}{^Tp}_{u} \end{aligned}$$
(4)

Where \({\hat{r}}_{u,i}\) is predicted rating, \(\mu \) overall average rating, \(b_{u}\) user biases, \(b_{i}\) item biases, and \(p_{u}\) users preferences.

In addition, we consider SVD++ [6], where implicit feedback is integrated into the same space for users and items. This auxiliary information helps improve the modeling of the users’ preferences. SVD++ is trained by using stochastic gradient descent method.

$$\begin{aligned} {\hat{r}}_{u,i}=\mu +b_{u}+b_{i}+q_{i}^{T}\left( p_{u}+ \left| R_{u}\right| ^{-1/2} \sum _{j\epsilon R_{u}} y_{j} \right) \end{aligned}$$
(5)

Where \({\hat{r}}_{u,i}\) is the predicted rating, \(\mu \) overall average rating, \(b_{u}\) user biases, \(b_{i}\) item biases, \(q_{i}\) static item characteristics, \(p_{u}\) users preferences, \(R_{u}\) ratings of user u, and \(y_j\) implicit factors.

5 Evaluation

We evaluated our approach with the IFTTT dataset that is obtained from [11]. This is the largest dataset that is currently available resource for analyzing TAP mashups. We have adopted some standards for offline evaluation, such as root mean square error (RMSE). For the purpose of our study and to establish recommendations, we have selected collaborative filtering algorithms for the recommendation of mashups. To establish a baseline with other collaborative filtering approaches, we have compared three algorithms.

The first baseline is a slope one predictor [7], which is an efficient method to query and a reasonably accurate algorithm. \({R_{i}}(u)\) is the set of relevant items, items j rated by a user u, which have at least one user common with i. dev(ij) is defined as the average difference between the ratings of i and those of j in the following form:

$$\begin{aligned} {\hat{r}}_{ui} = \mu _{u} + \frac{1}{|{{R_{i}}(u)}|} \sum \limits _{j \in {R_{i}}(u)} dev(i, j) \end{aligned}$$
(6)

The second baseline uses a weighted co-clustering algorithm [3], where users and items are clustered as \(C_u\) and \(C_i\), and co-clusters are defined as \(C_{ui}\). If a user is unknown, the predicted rating is \({\hat{r}}_{ui} = \mu _i\). If an item is unknown, the predicted rating is \({\hat{r}}_{ui} = \mu _u\). If both the user and item are unknown, the predicted rating is \({\hat{r}}_{ui} = \mu \). The rating predictions for \({\hat{r}}_{ui}\) are calculated as follows:

$$\begin{aligned} {\hat{r}}_{ui} = \overline{C_{ui}} + (\mu _{u} -\overline{C_{u}}) + (\mu _{i} - \overline{C_i}) \end{aligned}$$
(7)

Where \(\overline{C_{ui}}\) is the average rating of co-cluster \(C_{ui}\), \(C_{u}\) the average rating of u’s cluster, and \(\overline{C_i}\) is the average rating of i’s cluster.

6 Results

For our evaluation we have tested the co-clustering algorithm, SVD and SVD++ with the TAP factors algorithms. We found that SVD++ was the best performing algorithm, as shown in Fig. 2. This algorithm outperformed the others due to the inclusion of implicit feedback with the TAP factors.

We have used SVD++ with other implicit feedback features selected from the dataset. All the factors have been combined with the implicit evaluation of an item as in [6], but our best performer was the feature Trigger uses count more than average. Our results have been evaluated with 10 folds cross validation and 20 epochs we present the results in Fig. 2.

Fig. 2.
figure 2

Results of algorithm evaluation

7 Conclusion

In this work, we presented a model for mashup recommendation for the TAP paradigm. We analyzed the IFTTT 200,000 recipes dataset, where we extracted rating information for mashups, and selected a subset of relevant features. For our evaluation, we conducted experiments with different recommendation algorithms, and evaluated in depth with the best performing algorithm the SVD++. We tested our features with SVD++ with different implicit factors derived from TAP. We found that the factor to yield the most significant improvement was action uses count. Even though our results have not outperformed the results of other algorithms for a better recommendation. We will keep working in the improvement of recommendation of mashups as consumable programs. Owing to the growing state of IoT environments and the popularity of Trigger-Action programming, it is an important issue to assist the recommendation of related mashups to users in an effective way.