Keywords

1 Introduction

MeetupFootnote 1 is a social networking website for organizing local offline group meetings for people with similar interests. Thousands of Meetup groups, such as fitness group, career and networking group, photography group, hiking group, etc., are there for us to participate in. It provides a desirable approach to enrich our social life. For example, we can find a fitness partner in the Meetup group to support and encourage each other, or get someone to mentor us on photography. These Meetup groups provide us with a good way to explore the things we are interested in, meet new friends, broaden our social circle and even change our careers.

With so many Meetup group choices being available, a good recommender model can save our time in finding interesting Meetup groups, attracting more group members and making them more active. To this end, we propose to explore user historical interactions as well as group features to better match user interests with Meetup groups. The main contributions of this work are summarized as follows:

  • A coupled linear and deep nonlinear recommendation model is proposed to integrate both historical interactions as well as item side information. It can capture both user’s historical preferences and item characteristics.

  • We designed a pairwise learning algorithm for the proposed approach. To further improve the recommendation quality, we also adopted a dynamic negative sampling approach to conduct negative sampling more effectively.

  • We did extensive experiments on two large-scale datasets and demonstrated the superior performances of our approach over state-of-the-art baselines.

The reminder of this paper is structured as follows. In the next section, we will introduce the research problem we aim to address. Section 3 introduces the proposed approach. Section 4 shows the experimental setup and results. Section 5 introduces the related work and Sect. 6 concludes this paper.

2 Problem Formulation

Assuming that there are N items and M users, we have an interaction matrix \(X \in \mathcal {R}^{M \times N}\), and most entries of X are unobserved. Let \(X_{ui}\) denote the preference of user u to item i, \(X_{u*}\) denote the \(u^{th}\) row of the interaction matrix. For Meetup recommendation, there are only binary implicit feedback available and it can be viewed as a one-class recommendation problem [12]. The entries of X are defined as follows:

$$\begin{aligned} X_{ui}= {\left\{ \begin{array}{ll} 1, &{}\hbox {if interaction} <u, i> \text {is observed}\\ 0, &{}\hbox {otherwise} \end{array}\right. } \end{aligned}$$
(1)

The goal of the recommendation is to predict ranking scores for unobserved entires given the observed interactions, and then generate a personalized ordered list of items for each user based on the predicted scores. For a clear presentation, Table 1 summarizes the notations and denotations used in this paper.

Table 1. Notations and denotations

3 Proposed Methodology

In this section, we will introduce the proposed methodology in detail. Our model combines a linear part to capture the user historical interactions and a nonlinear component to incorporate the abundant side information.

3.1 Coupled Linear and Deep Nonlinear Model

Sparse linear model has demonstrated to be effective for top-n recommendations [20]. However, this model does not consider any side information. In recent years, deep learning has demonstrated to be very suitable for feature representation learning [1]. Therefore, we propose using deep neural networks to learn low dimensional feature embeddings from raw features. Since both usage history and item properties are critical for uncovering user’s real demands and interests, here, we design a hybrid model which couples sparse linear model with deep neural network for better service recommendation. The former (sparse linear model) is used to learn user’s interaction patterns, while the latter (deep neural network) aims to understand the content of items.

Formally, let \(A \in \mathcal {R}^{N \times N}\) denote a sparse aggregation co-efficient matrix. The ranking score of the linear part is calculated by

$$\begin{aligned} Y_{ui}^{I} = X_{u*} \cdot A_{*i} \end{aligned}$$
(2)

where \(X_{u*}\) is the \(u^{th}\) row the interaction matrix, and it is constructed from training set, so there is no leakage of the test data. Equation (2) is very similar to matrix factorization. We can view \(X_{u*}\) as the user latent factor and \(A_{*i}\) as the item latent factor. Nevertheless, \(X_{u*}\) is a known vector, and A is a sparse co-efficient matrix needed to be optimized. Moreover, A is reminiscent of the similarity matrix in item-based neighborhood collaborative filtering [15], but it is determined by minimizing a predefined loss rather than being calculated with Cosine or Jaccard similarities from the interaction matrix. Due to the sparse nature of \(X_{u*}\), some constraints such as sparsity and non-negativity are put on the co-efficient matrix A. More details will be introduced in the following text.

Another important component of our model is a deep neural network, which is used to integrate side information of items to further enhance the recommendation performance. Let \(s_i\) denote the side information of item i. We first feed it into a multi-layered neural network and get the high level dense representations. Formally, the definition of the multi-layered neural network is as follows:

$$\begin{aligned} z_1(s_i) = \sigma _1(W_1 s_i + b_1) \\ z_2(s_i) = \sigma _2(W_2 z_1(s_i) + b_2) \\ ...\\ z_L(s_i) = \sigma _L(W_L z_{L-1}(s_i) + b_L) \\ \end{aligned}$$

where L denotes the number of layers, \(W_l\) and \(b_l\) denote the weight matrix and bias vector of the \(l^{th}\) layer. \(\sigma _l\) is the activation function which could be sigmoid, hyperbolic tangent (tanh) or Rectifier (ReLU). With this nonlinear transformation, we manage to capture the complex and intricate data structure of item side information. Let k denote the dimension of the output, so \(Z_L(s_i)\) is a k-dimensional vector. To integrate this neural network into the recommendation model, we define a user latent factor \(P \in \mathcal {R}^{M \times k}\), and then model the user and item interactions with inner product:

$$\begin{aligned} Y_{ui}^{II} = P_{u} \cdot Z_L(s_i) \end{aligned}$$
(3)

Finally, we simply add the former two scoring results and get the final predicted ranking score.

$$\begin{aligned} Y_{ui} = Y_{ui}^{I} + Y_{ui}^{II} \end{aligned}$$
(4)

Figure 1 illustrates the structure of the proposed methodology. The left part is the linear component and the right part is the deep neural network.

Fig. 1.
figure 1

Illustration of the coupled linear and deep nonlinear method. It consists of two components: a linear component used to learn patterns from historical interactions; and a deep neural network used to capture the item features.

3.2 Pairwise Training Algorithm

To train the above model in a pointwise manner is computationally intensive. To accelerate the training process, here, we propose learning this model with a pairwise algorithm. We adopt a logarithm function which has a scaling factor to weight the difference between positive and negative samples. Formally, the loss function of our model is defined as follows:

$$\begin{aligned} \mathcal {L}(\theta ) = \sum _{(u,i^+, i^-)} log(1+ exp(-\tau \varDelta )) + \lambda \varOmega (\theta ) \end{aligned}$$
(5)

where \(\varDelta \) is formulated as \(\varDelta = (Y_{ui^+} - Y_{ui^-})\), \(i^+\) is the Meetup group that user u joined in and \(i^-\) is a negative item that the user has not interacted with. \(\tau \) is a scaling factor to weight \(\varDelta \). As indicated in Fig. 2(a), \(\tau \) can impact the convergence speed as it puts significant influence on the slope of the loss function. \(\theta \) is the model parameters including A, P, and neural network parameters \(W_*\) and \(b_*\).

The regularization terms are critical for the model performance. To ensure the sparse properties of co-efficient A, we put both \(\ell _1\) and Frobenius norm constraints on it. For other parameters, we find that Frobenius norm is sufficient. Thus, we have:

$$\begin{aligned} \varOmega (\theta ) = \parallel A \parallel _1 + \parallel A \parallel _F^2 + \parallel P \parallel _F^2 + \parallel W \parallel _F^2 \end{aligned}$$
(6)

In addition, we set the diagonal of A to zero and clip the value of A after each iteration to ensure \(A \ge 0\).

figure a

3.3 Dynamic Negative Sampling

We usually conduct random sampling to sample negative items for each <user, positive item> pair. However, this sampling strategy will not lead to optimal solutions. One reason is that it cannot guarantee to rank all negative items lower than positive items, while higher ranked negative items will hurt the ranking performance of the current model [31]. Figure 2(b) illustrates this point with an example (taken from [31]). We find that if we exchange the positions of the sixth item (observed) with the first item (unobserved), we increase the NDCG by 0.302. While the NDCG increase is only 0.035 if we exchange the sixth item with the fourth item. Therefore, it is better to rank all unobserved items lower than observed items.

Fig. 2.
figure 2

(a): The influence of parameter \(\tau \) for the pairwise logarithm function; (b): Example of NDCG differences by exchanging the positions of observed (yellow) and unobserved (gray) items (Best viewed in color).

This idea is initially designed for Bayesian personalized ranking model [21]. Here, we find that this assumption is also reasonable for our approach. Therefore, we propose applying the dynamic negative sampling method to our model, the sampling strategy is: in each epoch, we randomly sampled t items from negative candidates for each <user, positive item> pair, and calculate their ranking scores, and then treat the item with highest rank as the negative sample. Procedure 1 summarizes the training process of the proposed model.

4 Experiments

In this section, we conduct experiments on two Meetup datasets and compare our approach with several state-of-the-art baselines.

4.1 Datasets Description

These two datasets are collected by Hsieh et al. [11]. We also crawled the Meetup features from the Meetup websites. After removing Meetup groups without content information and users who interacted with less than 20 Meetup groups, we get two subsets: Meetup San Francisco and Meetup New York city. Detail statistics of the two datasets are summarized in Table 2. These two datasets contain thousands of Meetup groups and regular users from San Francisco and New York city. There are 33 categories of the Meetup groups which spread across most aspects of daily life, including: career & business, education & learning, outdoors & adventure, singles, new age & spirituality, support, games, hobbies & crafts, socializing, paranormal, cars & motorcycles, language & ethnic identity, parents & family, photography, music, sports & recreation, alternative lifestyle, tech, fine arts & culture, LGBT, movements & politics, religion & beliefs, pets & animals, fashion & beauty, fitness, food & drink, writing, sci-fi & fantasy, movies & film, book clubs, health & wellbeing, community & environment, dancing. The category distributions of two cities are shown in Fig. 3.

Table 2. Statistics of datasets meetup San Francisco and New York City.
Fig. 3.
figure 3

Statistics of Meetup category distribution for San Francisco and New York City.

4.2 Evaluation Metrics

To evaluate the recommendation accuracy, we report the results in terms of five evaluation metrics with two of which also consider the ranking qualities [22]. The five evaluation metrics are: Precision@N, Recall@N, Mean Average Precision (MAP), Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (NDCG).

In most cases, users only care about the topmost recommended items, we employ these evaluations at a given cut-off n. The definition are as follows.

$$\begin{aligned} Precision@n&= \frac{\# \,\,\text {of items the user interacted in~top n}}{n} \end{aligned}$$
(7)
$$\begin{aligned} Recall@n&= \frac{\# \,\,\text {of items the user interacted in~top n}}{\text {total} \,\,\# \,\,\text {of items the user interacted}} \end{aligned}$$
(8)

The former two evaluation metrics ignore the ranked position. MAP is used to assess the average accuracy of the overall ranking lists. It is the mean of all average precisions (AP) over all relevant users. \(\mathbf 1 _{rel}(i)\) is an indicator which equals to 1 if user u has interacted with item i, and 0, otherwise.

$$\begin{aligned} AP(u)&= \frac{\sum _{j=1}^N precision@j \times \mathbf 1 _{rel}(j)}{\# \,\,\text {of relevant items}} \end{aligned}$$
(9)

In practice, to make the items that interest target users rank higher will enhance the quality of recommendation lists. Therefore, we also employ two popular rank-aware evaluation metrics: MRR and NDCG. MRR cares about the single highest-ranked relevant item and it calculates the reciprocal of the rank at which the first item was put. NDCG evaluates the ranking quality of the overall recommendation list. The definition of MRR and NDCG are as follows:

$$\begin{aligned} MRR = \frac{1}{M}\sum _{u=1}^{M} \frac{1}{rank_u} \end{aligned}$$
(10)

Here, \(rank_u\) is the rank of the first correct item for user u.

$$\begin{aligned} DCG_n = \sum _{i=1}^n \frac{\mathbf{1}_{rel}(i)}{log_2{i+1}} \end{aligned}$$
(11)

And \(NDCG_n = DCG_n/IDCG_n\) with \(IDCG_n\) denoting the DCG for perfect ranked list.

4.3 Comparison Baselines

We compare our approach with the following seven traditional and recent advanced baselines:

  • Random. We randomly select Meetup groups from all possible candidates and recommend them to users.

  • MostPopular. It is a non-personalized method which generates recommendations based on item popularity and recommends users with the most popular items.

  • ItemKNN [2]. Item-based collaborative filtering method recommends items which are similar to other items the user has liked. Here the similarity between items is computed with cosine function.

  • BPRMF [21], BPRMF is a competitive baseline for ranking prediction. It also employs a pairwise ranking loss and is optimized with a Bayesian Personalized Ranking algorithm on implicit feedback.

  • WRMF [12], This algorithm is specified for one-class recommendation. It minimizes the squared errors in a pointwise manner and adopts a weight strategy to control the gradients for each user and item latent factors.

  • SLIM [20], SLIM is a top-n recommendation model. It uses a sparse linear method to generate recommendation by aggregating user purchase and rating profiles. We optimize the objective function in a Bayesian personalized ranking criterion due to efficiency consideration.

  • CML [10], collaborative metric learning considers the distances between users and items, and adopts the metric learning idea to learn user and item vectors. Here, we train this model with hinge loss (without WARP) due to the scalability issue of WARP loss [27].

For Random, mostPOP, ITEMKNN, BPR-MF, WRMF, SLIM, we use the implementation of Mymedialite [3]. We implemented our approach and CML with TensorflowFootnote 2. Since SLIM and CML are proved to perform better than many baselines such as [25], we do not further report them.

Table 3. Performance comparison in terms of precision@5, precision@10, recall@5, recall@10 and MAP on Meetup San Francisco.

4.4 Implementation Details

We implement our model with Tensorflow and test it on a Linux machine. All learning parameters are initialized with random normal distribution and we use Adam algorithm [14] to learn the optimal parameters. Hyper-parameters are tuned based on grid search. For the deep neural components, we use two hidden layers with constant structure with 20 neurons for each layer. The output dimension and k are set to 10. We use tanh as the nonlinear activations. The inputs of the deep neural component are the categories of the meetup groups. The learning rate is set to 0.001 and the regularization rate \(\lambda \) is set to 0.001. Batch size is set to 1024. The scaling factor \(\tau \) is set to 2. The dynamic negative sampling size t is set to 5. We randomly split each dataset into a training set and a testing set by the ratio of 5:1, and report the average results over five different splits. Parameters of other baselines are also tuned carefully to achieve the best performances.

Fig. 4.
figure 4

MRR (a) and NDCG (b) comparison on Meetup San Francisco (We omit Random as it performs poorly).

Table 4. Performance comparison in terms of precision@5, precision@10, recall@5, recall@10 and MAP on Meetup New York City.

4.5 Results and Analysis

Tables 3, 4 and Figs. 4, 5 show the performance comparison on the two datasets. We observe that our model outperforms all other baselines in terms of both accuracy and ranking qualities. The overall improvements on Meetup San Francisco is about \(12.53\%\), and that on Meetup New York city is about \(5.08\%\). We find that latent factor models such as BPR and WRMF do not work well on both datasets, especially on Meetup San Francisco, which might be caused by the extreme sparsity. The performances of CML are slightly worse than WRMF. Similarity based model ItemKNN works well on Meetup New York city but it is computational expensive at prediction stage. SLIM is a very strong baseline. Our model is built upon SLIM, but our model outperforms SLIM by a large margin. The main reason is that our model can capture the content of the items and optimize the results with a more reasonable sampling strategy.

In addition, we also compare our model with SLIM in terms of convergence speed. Figure 6 shows the varying MAP, NDCG, Precision@10 and Recall@10 of our model, SLIM and ItemKNN on dataset Meetup San Francisco with the increase of training epochs. We find that our model converges much faster than SLIM, and it only takes about 15 iterations to achieve the best performance. This is mainly due to the dynamic negative sampling method we adopted as this sampling strategy can help our model to find comparably informative negative samples.

Fig. 5.
figure 5

MRR (a) and NDCG (b) comparison on Meetup New York City.

Fig. 6.
figure 6

Convergence of our model and SLIM in terms of (a) MAP, (b) NDCG, (c) Precision@10, (d) Recall@10. Overall, our model converges much faster than SLIM.

5 Related Work

In this section, we briefly review the related work of event recommendation and deep learning based recommendation.

5.1 Deep Learning for Recommender System

In recent years, deep learning has been revolutionizing the recommender systems. The achievements of deep learning based recommender systems in both industry and academia are inspiring and enlightening [28]. There are a various of deep learning techniques [5], and most of them can be applied to recommendation tasks somehow. For example, Convolutional Neural Network (CNN) can be used to extract features from textual [13] and visual information [6] of items and users. Recurrent Neural Network (RNN) is capable of modeling the temporal dynamics and sequential patterns of historical interactions [9]. Autoencoder can learn salient feature representations from side information to enhance recommendation quality [25, 29, 30]. We can even combine several deep learning technologies together to form a powerful composite recommendation model. Deep learning algorithms can also be integrated into conventional recommendation methods such as matrix factorization, factorization machine and collaborative metric learning [7, 8, 23]. There are two major motivations in applying deep learning techniques to recommender systems. First, Deep learning is powerful in representation learning [1], thus it also provides a desirable tool for feature learning in recommender systems [24]. Second, with nonlinear activations, we can add nonlinearity to recommendation models to capture intricate and complex characteristics of real-world datasets.

5.2 Event Recommendation

Another related work is about event recommendation since Meetup meeting is also a kind of event. Note that, in this work, we mainly focus on recommending Meetup groups for users to join in rather than recommending Meetup meetings (The organizer of Meetup groups can host Meetup meetings regularly, so Meetup groups recommendation and Meetup meetings recommendation are two different tasks for Meetup service recommendation.), nonetheless, it is an important task that we want to solve in the future. [17] proposed an event recommendation methodology based on graph random walking and history preference reranking. They obtain the candidate events by executing random walking on a hybrid graph consisting of different types of nodes to represent available entities in an event-based social network. Then they extract user preferences from her attended events and compute the similarities between her interests and her candidate events. Finally, recommended event lists are obtained by combining the two similarity scores. [26] proposed a Social Information Augmented Recommender System (SIARS), which fully exploits the social influence of event hosts and group members together with basic context information for event recommendation. [16] formulated multiple interactions among users, events, groups and locations into an unified framework and proposed a collective pairwise matrix factorization (CPMF) model to estimate users’ pairwise preferences on events, groups and locations. [18] proposed a successive event recommender system based on graph entropy (SERGE) to deal with the new event cold start problem by exploiting diverse relations as well as asynchronous feedback in EBSNs. [19] proposed a new link prediction method for the Meetup social network, which recommends events to users according to the events they participated in and their field of interests. [4] proposed a Bayesian latent factor model (denoted as SogBmf) for event recommendation, based on the matrix factorization framework, to integrate social group influence with individual preference.

6 Conclusion and Future Work

In this paper, we proposed a coupled linear and deep nonlinear model for Meetup service recommendations. Our model can not only model the historical interaction patterns but also learn the item features effectively. We explored a novel logarithm loss for pairwise training of the proposed model. To further enhance the accuracy, we adopted a dynamic negative sampling strategy to select informative negative samples, which can improve the performance and lead to faster convergences. Experiments on two real-world large-scale Meetup datasets showed that our model can achieve the best performances for Meetup service recommendations.

In the further, we will explore integrate contextual information such as date, location, social network and weather to better anticipate user’s intentions so as to make more satisfying recommendations. We will also explore methods for better Meetup meetings recommendation to enhance the Meetup service user experience.