Keywords

1 Introduction

It has been shown that Facebook likes, as a kind of easily accessible digital records of behavior, can be used to automatically and accurately predict a range of highly sensitive factoid and opinionated personal attributes, including age, gender, parental separation and political views, stance, happiness, respectively [1]. However, literature shows that most researches use real likes instead of predicting them [2], which makes the information of a large quantity of new documents difficult to be utilized.

Predicting Facebook likes is to predict “who will read and like the current document enough to push the like button”. It is similar to the product recommendation problem, which predicts “who will view and like the product enough to purchase it”. However, there are still several differences between them, and the major difference is the degree of willingness. For the product recommendation research, there are usually view records and purchase records showing two different degrees of willing for utilization. View records indicates the relevance and slight preference, while purchase records show the strong preference (to the degree of willing to spend their money). Instead in Facebook, we have only like records, whose degree of willing is between view and purchase records. It costs some effort to push the like button, but the effort is little compared to the purchase. The difference in degrees of willing leads to different challenges.

In Facebook, social network should help a lot in predicting likes. However, the privacy issue has become crucial for all social media and the social network, e.g., friend information is not easily accessible anymore. Many researches ask volunteers to access their social network, but this is not feasible in real applications.

In this paper, we solve the Facebook like prediction problem with only the public information: the document content and the user engagement to the document. Therefore, we divide our research problem into two sub-problems: predicting likes by document content and predicting likes by user engagement. The former is realized by calculating similarities between documents with the intuition that users should like to read similar documents, i.e., documents of preferred topics or viewpoints. The latter is realized by a Restricted Boltzmann Machine (RBM) as it is shown to have good prediction power with enough known records [3]. Then we conquer this problem with a weighted function considering these two sub-models. Through the prediction of likes, we hope to provide valuable reference for the government and the companies to get more endorsements by their documents.

2 Related Work

The recommendation problem has been widely studied for decades [3]. It is known as collaborative filtering (CF) and researchers divide them into item-based, user-based and hybrid according to their recommendation strategy.

For item-based CF, it recommends items that similar to user’s previous liked items or purchased products [4, 5], some of them has been adopted on business like Amazon [4]. The key point of item-based CF is presenting each item as a vector and then performing an item-to-item similarity matrix. For example, the item vector in Linden‘s work is an M dimensional vector, where each dimension correspond to a customer [4]. Also, there are ways to compute the similarity between items, e.g., cosine similarity, Pearson-r correlation, or the adjust cosine similarity where the values in each dimension of the vector are normalized firstly [5]. Besides product recommendation, document recommendation is similar to our work. For example, PRES (Personalized REcommender System) used TF-IDF of each word as a document vector and utilized cosine similarity to recommend web posts based on content [6]. The major difference between PRES and our work is that we focus on predicting the “like” behavior on Facebook but PRES aimed at retrieving relevant (judged by user feedback) hyperlinks to users.

For user-based CF, items that liked by similar users will be recommended. It can be performed using the same ideology in item-based CF: presenting a user as a vector according to her liked/purchased record and then computing a user-to-user similarity matrix. However, the shortage of using user vectors instead of item vectors is the sparsity of the user vectors because the size of users is significant larger than the size of items. To deal with this problem, researchers added more information on how to compute the similarity between users, e.g., social relationship [7], similar interests [8] or similar video viewing records [9]. Researches using user-based CF have shown that user information benefits the recommendation for product [7], document [8, 10, 11], and even video [9]. Besides, some researches in social network domain use belief propagation for marketing [12, 13], which can be considered as a variation of user-based CF to utilize the user information.

As for hybrid model, it combines item-based and user-based information to recommend [14, 15]. However, the problem turns to the combination of two information as Melville addressed [14]. In this paper, we propose a late-fusion approach, which combines two information via a weighted function.

3 Method

As mentioned, we propose a hybrid model that incorporates probabilities from the item-based model and the user-based model. Given a set of users and a set of documents, the likes form a matrix \( {\mathbf{L}} \in R^{\left| U \right| \times \left| D \right|} \), where one column indicates one user and one column indicates one document. The matrix L is a sparse matrix where the elements l i,j in L equals to one if user u i liked the document d j in the dataset. In fact, the like prediction is to estimate the “unknown” elements in L, where the user u i did not click like to document d j . In our paper we aim at estimating the l i,j using item-based method and user-based method. Assuming that they can be independently estimated, formally we write,

$$ p(l_{i,j} ) = w \cdot p_{user} (l_{i,j} ) + (1 - w) \cdot p_{item} (l_{i,j} ) $$
(1)

where p(l i,j ) is the estimated value of l i,j , p user (l i.j ) and p item (l i.j ) denote the probability estimated by the user-based and item-based model, respectively, and w is used to control the weights of the two models.

In the following, we will detail how we compute two probabilities based on different features.

3.1 Item-Based Model: BLEU

As for item-based model, we recommend documents that similar to the documents that the user liked. There are two assumptions used in our item-based model. First, users have no preference among documents, that is, users equally liked them. Second, we have very long term liking history that logs likings of the user and thus all kinds of liked documents were known, then we can model the liking probability via similarities. With this in mind, given a document d j , we compare d j with what user liked before, and choose the highest similarities among all comparisons as the estimated probability. Formally,

$$ p(l_{i,j} ) \cong \hbox{max} (sim(d_{j} ,d_{m} )),\forall d_{m} \in D_{i} $$
(2)

where d j is the target document, D i is the documents that user u i liked, d m is one of document in D i , and sim(.) is the similarity function. In this paper, we approach the similarity function by the BLEU score.

BLEU is a modified form of precision which compares the candidate translation against multiple reference translations. It is a commonly used measurement for the quality of bilingual machine translation considering the similarity of n-grams between the candidate translation and the reference translations [16]. To predict whether a user u i will like the current document d j , we view it as the candidate translation of the reference translation set D i , which includes all kinds of documents this user has ever liked in the whole dataset. That is, the document d j is an alternative way to express the same content of one document in D i . As a result, if the translation quality is high enough to indicate that d j can be derived from D i , user u i would like document d j with the estimated probability. To calculate BLEU scores, documents are tokenized into words and all punctuations are removed. However, unlike machine translation, the topics of the reference documents may vary. Therefore, we calculate BLEU scores between d j and each reference document d m , and report the BLEU score by maximizing these BLEU scores.

Then for each user u i , we sort the probabilities among the documents not in D i and recommend the top n documents to the user.

3.2 User-Based Model: RBM

For user-based model, we recommend documents liked by other users that similar to the target user. The main challenge in presenting a user as a vector is the sparsity of the like pattern of a user, where a like pattern is a high dimensional vector and there are only a few of ones but most of the values are zero. In our paper, we use a simple deep learning model – Restricted Boltzmann Machine (RBM) to encode a like pattern into a low-dimensional and dense vector as shown in Fig. 1 [11].

Fig. 1.
figure 1

A sample RBM network

Formally, for each user, the like pattern is a binary vector L i  = {l i,1 ,l i,2 ,…l i,j ,…l i,| D|}, where l i,j  = 1 if the user like the document d j , and |D| is the number of documents in our dataset. Then the RBM aims to optimize the parameters that maximize the observed probability p(L i ), shown in Eq. 3.

$$ \theta = \mathop {\arg \hbox{max} }\limits_{{\theta = \{ {\mathbf{W}},{\mathbf{b}}_{{\mathbf{L}}} ,{\mathbf{b}}_{{\mathbf{h}}} \} }} {\mkern 1mu} \ln p({\mathbf{L}}_{i} ) $$
(3)

where W, b L and b h are weights, bias for visible layer and bias for hidden layer, respectively. Then the RBM model optimizes the joint probability of the visible layer L i and the hidden layer h in Eq. 4, where the energy function is defined in Eq. 5.

$$ p({\mathbf{L}}_{i} ,{\mathbf{h}}) = \frac{1}{{{\mathbf{Z}}(\theta )}}e^{{ - {\mathbf{E}}({\mathbf{L}}_{i} ,{\mathbf{h}})}} $$
(4)
$$ {\mathbf{E}}({\mathbf{L}}_{i} ,{\mathbf{h}}) = - \sum\limits_{{l_{i,j} \in {\mathbf{L}}_{i} ,h_{k} \in {\mathbf{h}}}} {l_{i,j} } h_{k} W_{j,k} - \sum\limits_{{l_{i,j} \in {\mathbf{L}}_{i} }} {l_{i,j} } b_{{l_{k} }} - \sum\limits_{{h_{k} \in {\mathbf{h}}}} {h_{k} b_{{h_{k} }} } $$
(5)

After the training process, we used the parameters θ to predict likes. The one step Gibbs sampling was adopt to approximate the probability of a user to like the document as in Eqs. 6 and 7, suggested in [11].

$$ \theta = \mathop {\arg \hbox{max} }\limits_{{\theta = \{ {\mathbf{W}},{\mathbf{b}}_{{\mathbf{L}}} ,{\mathbf{b}}_{{\mathbf{h}}} \} }} {\mkern 1mu} \ln p({\mathbf{L}}_{i} ) $$
(6)
$$ p(l_{i,j} = 1|\hat{p}_{i} ) = \sigma (b_{{l_{j} }} + \sum\limits_{{\hat{p}_{k} \in \hat{p}_{i} }} {\hat{p}_{k} } W_{j,k} ) $$
(7)

where \( \hat{p}_{k} \in \hat{p}_{i} \) is the probability of the hidden layer given the visible layer, and \( p(l_{i,j} = 1|\hat{p}_{i} ) \) is the like probability for the document d j of the user given the hidden layer. We then use the probability in Eq. 7 to recommend the top n documents for the user.

4 Experiments

4.1 Dataset

The experimental dataset was collected from Facebook fan pages related a same topic ― nuclear power. The posting time of documents spans from September 2013 to August 2014. A total of 34,402 documents as well as their author and liker IDs were recorded. For content-duplicated documents (usually re-posts), their authors and likers were merged. Although the posting and liking behavior might have different implications, we default that authors should like whatever they posted. In addition, we removed users and documents having fewer than ten likes, and randomly selected 10% of likes per user as the testing data and 90% of likes per user as the training data. Table 1 shows the result after removing these users and documents.

Table 1. Like statistic

4.2 Baselines

We use label propagation as a baseline to find potential likes. Given the like matrix L, a transition matrix T that defines the label transition probability from document to document, the goal of label propagation is to update L given T. Formally,

$$ {\mathbf{L}}^{{\prime }} = \alpha \cdot {\mathbf{L}}^{0} + \left( {1 - \alpha } \right) \cdot {\mathbf{T}} \cdot {\mathbf{L}} $$
(8)

where L 0, L and \( {\mathbf{L}}^{{\prime }} \) are the prior label, the label from previous iteration and the updated label; T is the transition matrix, and α is the prior parameter that determines the initial label priority. Note that the label here denotes whether the user likes the document. We repeatedly compute Eq. (8) to update L and predict new likes. Different factors are considered in building the transition matrix T: co-liker, semantic similarity using BLEU score or n-gram vector. That is to say, a higher probability to be liked by a certain user are assigned to two documents that are liked by similar users or use similar words. Co-like T L is calculated by the Jaccard coefficient where U(d) is liker set of document d:

$$ {\mathbf{T}}_{L} (i,j) = \frac{{\left| {U(d_{i} ) \cap U(d_{j} )} \right|}}{{\left| {U(d_{i} ) \cup U(d_{j} )} \right|}} $$
(9)

For semantic similarity using BLEU score T B , we use the BLEU score calculated in the Sect. 3.1. On the other hand, for semantic similarity using n-gram T g , we first present a document with a binary vector \( v\left( d \right) \in R^{\left| V \right|} \) using n-gram features, where |V| is the size of n-grams in our dataset, including uni-gram, bi-gram and tri-gram. Then for any two documents, we compute the cosine similarity using their n-gram vectors to form the transition matrix.

4.3 Evaluation Metric

For each user u i , assuming a set of documents D i was liked by u i in the testing data, we recommend top n documents D n for the user (likes in the training data will not be recommended). The precision is given by \( \left| {D_{i} \cap D_{n} } \right|/\left| {D_{n} } \right| \), and the recall is given by \( \left| {D_{i} \cap D_{n} } \right|/\left| {D_{i} } \right| \). In the results section, we report the user average precision and the user average recall on top n documents. We plot the ROC curves (Receiver Operating Characteristic) and the PRT curves (Precision-Recall-Threshold) for further discussion.

5 Predicting Facebook Likes

We first demonstrate the performance of prediction of the item-based model and the user-based model. Then, the result of the hybrid probability model is reported.

5.1 Item-Based Model

For item-based model, documents that have similar content would be recommended to be liked by the same set of users. The PRT curve in Fig. 2 shows the precision and recall when recommending different numbers of documents per user. The limitation of item-based model on Facebook data can be found from Fig. 2. Recalls are lower than 5% and precisions are lower than 0.5%. Owing to the fact that the documents in our dataset are all related to a same topic, the recommendation system based on semantic features is hard to predict like. Even though some documents are semantically similar to what a user liked before, the documents that hold opposite opinion might not interest the user.

Fig. 2.
figure 2

PRT for item-based model

5.2 User-Based Model

In the user-based model, we generate the probability for each document and user pair. The same set of documents would be recommended to users having similar like- records. The PRT curve in Fig. 3 shows that user-based model has higher recall but lower precision. This tendency can also be found in the related work, where the user-based model has precision in the range of 2–10% while recall in the range of 3–16%. However, our user-based model using RBM achieves higher precision (in the range of 5–10%) and recall (in the range of 2–55%). Comparing with item-based model, the performance of user-based model is significantly better. It suggests that user-based model successfully captures the taste of users and predicts the interesting documents to the target audiences.

Fig. 3.
figure 3

PRT for user-based model

5.3 Hybrid Model

For the hybrid model, we joint the item-based model and the user-based model, setting w as 0.5 (which means item-based and user-based model are equally important). The PRT curve is shown in Fig. 4. We find that the worse results of item-based model propagate noise to the hybrid model. It suggests that our late-fusion method using weighted function would be seriously harmed by noisy components.

Fig. 4.
figure 4

PRT for hybrid model

5.4 Discussion

For comparison, we implement the label propagation, which has been widely used in classification [17] or recommendation problems [9]. Figure 5 shows the PRT curves of label propagation with co-liker feature and Fig. 6 shows the ROC curves of label propagation with co-liker feature and our user-based model using RBM. Comparing with RBM model, the label propagation is based on the same information but the results in Fig. 6 show that the proposed model successfully utilizes the visible units of RBM, and largely improves the performance. The label propagation does not perform well on the like prediction problem especially because of the sparsity of the like matrix. However, the user-based model using RBM can encode the user information into a dense vector and decode it to generate the like prediction.

Fig. 5.
figure 5

PRT for label propagation (co-liker)

Fig. 6.
figure 6

ROC for RBM and label propagation with co-liker

Figures 7 and 8 show the PRT curves of label propagation with the n-gram vector similarity and the BLEU score, respectively. Figure 9 shows the ROC curves of above label propagation models and our item-based model. From Figs. 7 and 8, we can tell that the like prediction methods using semantic features are limited. Besides, the BLEU score is slightly better than simple cosine similarity of n-gram vectors.

Fig. 7.
figure 7

PRT for label propagation (n-gram vector)

Fig. 8.
figure 8

PRT for label propagation (BLEU score)

Fig. 9.
figure 9

ROC for item-based model, label propagation with n-gram vector, and label propagation with BLEU score

Figure 9 shows that the strategy to find documents which are similar to the liked documents is better than propagate like information among a semantic similarity matrix. Receiving the like information from the most similar documents would be better than receiving the information from all documents. This phenomenon can be found in some related work where they suggested finding and propagating information to the nearest neighbors (four nearest neighbors in their paper) [18].

Figure 10 summarizes the impact of the weight w in our hybrid model. It shows that the model purely relies on the user-based model (w = 1.0) achieves the best performance. Though the results suggest that the item-based model has limited performance on like prediction problem when we deal with the dataset contains only one topic, the results in Figs. 7, 8 and 9 show that the item-based model still has its merit.

Fig. 10.
figure 10

ROC for different weights

6 Conclusion

In this paper, we have proposed different recommendation models based on the document content or the user engagement. The proposed models successfully utilize the similarity between documents and the probabilities from the visible units of RBM. We have shown that the proposed model outperforms the commonly adopted label propagation model. Moreover, we show that item-based model relying on semantic features cannot achieve satisfied results than user-based model. In the future, we will test more similar models and methods to integrate the user-based and item-based information to improve the probability approximation for new upcoming documents.