1 Introduction

Recent years have witnessed an explosion of multimedia contents, involving images, videos and text, in social media networks. Users are able to produce, view, share and reproduce content in a number of social scenarios, and even interact with media to create additional metadata such as tags and comments. To facilitate users to conveniently digest media content, a lot of tools, such as tag-based image search and Flickr groups in Flickr, have been developed, and a lot of research efforts have been conducted. Particularly, in this paper, by investigating the varied facilities in social media networks, we study Flickr groups and address the problem of automatically recommending Flickr groups to users.

Flickr groups, a social connection feature on Flickr, are self-managed communities with common interests, where users can share and comment on photos. An example of Flickr group is shown in Fig. 1 and example photos from its members are shown in Fig. 2. Flickr groups are not only containers of media contents but also bridges to connect users for social media. Groups are created spontaneously but not randomly: people participate in groups for specific intentions (e.g., interested in the visual content or introduced by other users with similar interest), and the photos in the group are usually with a common theme. The current system is able to index and retrieve the groups to help users conveniently search and discover groups of interest. However, in order to make users access to desired groups more easily, it is necessary to deploy an automatic group recommending system upon the social media infrastructure.

Fig. 1
figure 1

An example of Flickr group: Dog Days

Fig. 2
figure 2

Example photos from randomly selected two members of the Dog Days Flickr group

Our principle for recommending groups to users is that the users and the recommended groups have large probability of sharing the similar latent interests, which can be discovered and mined from the rich information available in social media networks including metadata, uploaded and shared images in the groups. Moreover, rather than separately discovering the latent interests and subsequently learning the recommendation function from the available links between users and groups, we propose a social topic model to simultaneously discover latent interests for users and groups and estimate the recommendation function. Specifically, we first present a probabilistic latent topic model to capture the interests of users and groups. To match interests of users and groups, we impose the restriction that these two topics models share the common latent topic bases. Next, we explore the latent discovery process by capturing the social link structure to connect the common interest topics between users and groups. Simultaneously, a recommendation function is embedded over the social connections underlying the discovered latent topics. An effective inference approach based on Gibbs sampling is applied to efficiently learn the latent topics and the recommendation functions.

2 Related work

Flickr group has become one of the most representative social media networks connecting hundreds of thousands of interest groups. Its commercial success has been attracting more research attention to study many interesting social phenomenons occurring in this platform. One of the pioneer works is analyzing the Flickr group ecosystem (Negoescu and Gatica-Perez 2008a, 2010) and designing new group-search strategies by a topic-based representation for groups computed from group tags. Topickr (Negoescu and Gatica-Perez 2008b) explores discovered topic-based representations for users and groups and furthermore ranks users and groups for each topic for another Flickr exploration experience. Hypergroup (Negoescu et al. 2009) presents an approach to cluster Flickr groups in order to help search Flickr groups. The semantic hierarchies of Flickr groups are also exploited in (Lu and Li 2010). Flickr groups are also used to help finding landmark photos (Abbasi et al. 2009).

All these works encourage to develop and deploy an automatic group recommending system for Flickr. The approaches in (Yu et al. 2009a, b) are proposed to discover the latent events or topics by mining the visual contents and tags from images and then recommend photos to groups by matching their latent events or topics. The SheepDog system (Chen et al. 2008) recommends Flickr groups to photos by detecting photo concepts. A semi-automatic approach is presented to loop human into the process of suggesting Flickr groups to users (Cai et al. 2011) that combines the group classifiers. Recommending Flickr groups to users is more challenging because user’s profile contains a set of photos and extra existing user-group relations. This motivates us to investigate a new way to perform group recommendation.

Suggesting groups to users was ever studied in (Zheng et al. 2010a, b) by simply casting it into the collaborative filtering framework and directly applying tensor-based approaches for recommendation. Those approaches neglect the rich visual information associated with the photos. In contrast, our approach explores both visual contents and social relations to discover and match the latent interests of users and groups for recommendation.

Besides social media networks, social communities have also been investigated and studied in other social networks, including Facebook, Orkut and so on. An approach to model social groups is presented in (McCarthy et al. 2007) by aggregating individual user models, in order to cope with the potentially conflicting preferences of multiple users when selecting items for recommendation. A group recommendation approach is presented in (Boratto et al. 2009) by detecting intrinsic communities of users whose preferences are similar. The group personality composition is investigated in (Recio-García et al. 2009) for group recommendation. A combinational collaborative filtering approach (Chen et al. 2008) is proposed to fuse both semantic and user information for community suggestion. Essentially most current approaches can be cast into the collaborative filter framework. In contrast, additional rich visual content in social media network implies the interests of users and groups and can help group recommendation.

General recommendation systems have been well investigated and a comprehensive survey can be found in (Adomavicius and Tuzhilin 2005; Herlocker et al. 2004). Roughly speaking, recommendation techniques are divided into three categories. The first one is content-based recommendation, in which the user (or other modules) will be recommended items similar to the ones the user preferred in the past. The widely-used methods are based on topic models, such as latent semantic analysis (Deerwester et al. 1990), probabilistic latent semantic analysis (Hofmann 1999, 2004; Negoescu et al. 2009; Negoescu and Gatica-Perez 2008b), and latent Dirichlet allocation (Blei et al. 2003). The second one is collaborative filtering, in which the user will be recommended items that people with similar tastes and preferences liked in the past. Most methods are based on matrix factorization (Herlocker et al. 2004; Koren 2008, 2010; Koren and Bell 2011; Su and Khoshgoftaar 2009), and some probabilistic matrix factorization methods, such as (Ma et al. 2008, 2009), are proposed to deal with large scale data and noises. The last one is hybrid recommendation that combines collaborative and content-based methods and essentially benefits from both the two methods. The proposed approach in this paper can be viewed as a hybrid approach and exploits both visual contents and the existing links between users and Flickr groups.

3 Latent beta composition

Before moving further, we first propose a generative probabilistic model—Latent Beta composition (LBC), based upon which the group recommendation model is built. The basic idea is that users are represented as mixtures over latent topics, where each topic is characterized by a Gaussian distribution. Its graphical representation is shown in Fig. 3. LBC assumes the following generative process for each user u.

  1. 1.

    Sample a random vector \( {\boldsymbol{\theta}} = [\theta_1 \cdots \theta_D]^T\) so that θ i ∼Beta(α, β).

  2. 2.

    Sample each element ϕ ij of \(\varvec{\Upphi}\) so that ϕ ij ∼Gaussian(μ, λ2).

  3. 3.

    For each of the N images I i ,

    1. (a)

      Sample a topic vector \({\bf z}_i = [z_{i1} \cdots z_{iD}]^T\) so that z ij ∼Bernoulli(θ j ).

    2. (b)

      Sample an image I i so that \({\bf f}_i \sim \hbox{Gaussian}(\varvec{\Upphi}\hbox{Diag}({\bf z}_i){\bf w}_i, \sigma^2{\bf I})\).

Fig. 3
figure 3

The graphical representation of latent Beta composition

The variables in this model consist of three categories: model parameters including α, β, σ,  μ, λ and w i , hidden variables including \(\boldsymbol{\theta}\) and z i , and observation variables f i . Compared with topic modeling for a corpus of document, a user (or group) in our problem corresponds to a document, and an image corresponds to a word.

The assumption in the topic model is that the interest of a user (or a group) can be characterized by the latent variables \(\boldsymbol{\theta}\) and an image can be composed of a few of topics selected from a dictionary \(\varvec{\Upphi} = [\phi_1 \cdots \phi_D]\). In the mathematical form, \({\varvec{\Upphi}}\hbox{Diag}({\bf z}){\bf w}, {\bf z}\) is a topic vector, and Diag(z) is a diagonal matrix with the diagonal entries valued from z, and serves the role of selecting the atoms from the dictionary. w is the projection coefficient vector in which the entries corresponding to the selected atoms are valid. Each entry in z is valued as 1 if the i-th topic is used to compose the image and valued as 0 otherwise.

Each entry of the D-dimensional random vector \(\boldsymbol{\theta}\) satisfies a Beta distribution, and then the vector essentially satisfies a joint distribution,

$$ {\boldsymbol{\theta}} \sim \prod_{d=1}^D \hbox{Beta} (\alpha, \beta) = \prod_{d=1}^D \frac{1}{\hbox{B}(\alpha, \beta)} \theta_d^{\alpha - 1} (1 - \theta_d)^{\beta - 1}, $$
(1)

where the beta function, \(\hbox{B}(\alpha, \beta)\), appears as a normalization constant to ensure that the total probability integrates to unity. Here we use a Beta distribution instead of the Dirichlet distribution in Latent Dirichlet allocation (Blei et al. 2003). The basic unit in document analysis is a single word and often has a single meaning in the context of the document. Thus, the Dirichlet distribution is enough. In contrast, the basic unit in our problem, a feature vector f, which is composed of an image and its textual feature represented by the vector space model, is itself also a mixture of topics. This is natural because an image may not always contain a single concept and moreover it cannot be represented by a single topic, even it only contains a single concept due to the various appearances of the same concept.

The topic vector z i satisfies the following joint distribution,

$$ {\bf z}_i \sim \prod_{d=1}^D \hbox{Bernoulli} (\theta_j) = \prod_{d=1}^D \theta_j^{z_{id}} (1-\theta_j)^{1-z_{id}}. $$
(2)

In the Gaussian distribution, \(\hbox{Gaussian}(\varvec{\Upphi}\hbox{Diag}({\bf z}_i){\bf w}_i, \sigma^2{\bf I}), {\bf w}_i\) is computed as \({\bf w}_i = \varvec{\Upphi}^T {\bf f}_i\). Then f i satisfies the following distribution,

$$ {\bf f}_i \sim \hbox{Gaussian}(\varvec{\Upphi}\hbox{Diag}({\bf z}_i){\bf w}_i, \sigma^2{\bf I}) $$
(3)
$$ = \frac{1}{({2\pi\sigma^2})^{\frac{D}{2}}}e^{-\frac{1}{2\sigma^2} \|{\bf f}_i - \varvec{\Upphi}\hbox{Diag}({\bf z}_i){\bf w}_i\|_2^2}. $$
(4)

4 Formulation

Let \(\mathcal{U}\) represent the set of m users. Each user \(u \in \mathcal{U}\) is associated with a set of photos \(\mathcal{I}^u = \{I_{1}^{u}, \cdots, I_{N^{u}}^{u}\}\). Let g represent a Flickr group, and \(\mathcal{G}\) represent the set of n groups. Each group g is associated with a set of photos \(\mathcal{I}^g = \{I_{1}^{g}, \cdots, I_{N^{g}}^{g}\}\). Let f (f u i or f g j ) represent the content feature of an image I (I u i or I g j ). Denote the user-group relation by an m × n matrix R, in which r ug  = 1 means that the user u is a member of the group g. Initially, only a part of the entries are valued as 1 and the rest are unknown. The task is to find the pairs (ug) that have large probabilities to be 1, which indicates that the user u will tend to join the group g.

4.1 Joint topic model

We present a social topic model to jointly discover both interests for users and groups and find the prediction function that checks the matching degree of users to groups. The joint model links users and groups together by building a prediction edge over related users and groups to bridge their interests. There are several advantages in the joint model. On the one hand, the bridges between users and groups will benefit the interest discovery for each other. The join-in relation of a user to a group implies that they share common interests, and the joint topic model with this connect will result in that the interests discovered are more consistent. On the other hand, the prediction function and interest discovery will help each other since these two are simultaneously computed. The graphical representation is shown in Fig. 4. The joint LBC assumes the following generative process.

  1. 1.

    Sample each element \(\boldsymbol{\phi}_{ij}\) of \(\varvec{\Upphi}\) so that \(\phi_{ij} \sim \hbox{Gaussian}(\mu, \lambda^2)\).

  2. 2.

    The generative process for each user u

    1. (a)

      Sample a random vector \( {\boldsymbol{\theta}}^u = [\theta_1^u \cdots \theta_D^u]^T\) so that \(\theta_i^u \sim \hbox{Beta}(\alpha^u, \beta^u)\).

    2. (b)

      For each of the N u images I u i ,

      1. i.

        Sample a topic vector \({\bf z}_i^u = [z_{i1}^u \cdots z_{iD}^u]^T\) so that \(z_{ij}^u \sim \hbox{Bernoulli}(\theta_j^u)\).

      2. ii.

        Sample an image I u i so that \({\bf f}_i^u \sim \hbox{Gaussian}(\varvec{\Upphi}\hbox{Diag}({\bf z}_i^u){\bf w}_i^u, {\sigma_u}^2{\bf I})\).

  3. 3.

    The generative process for each group g

    1. (a)

      Sample a random vector \( {\boldsymbol{\theta}}^g = [\theta_1^g \cdots \theta_D^g]^T\) so that \(\theta_{i}^{g} \sim \hbox{Beta}(\alpha^g, \beta^g)\).

    2. (b)

      For each of the N g images I g i ,

      1. i.

        Sample a topic vector \({\bf z}_i^g = [z_{i1}^g \cdots z_{iD}^g]^T\) so that \(z_{ij}^g \sim \hbox{Bernoulli}(\theta_j^g)\).

      2. ii.

        Sample an image I g i so that \({\bf f}_i^g \sim \hbox{Gaussian}(\varvec{\Upphi}\hbox{Diag}({\bf z}_i^g){\bf w}_i^g, {\sigma_g}^2{\bf I})\).

  4. 4.

    Sample the relations between users and groups R = [r ug ] so that \(r_{ug} \sim \hbox{Bernoulli}(h({\bf y}^u, {\bf y}^g; \eta, \rho))\).

Fig. 4
figure 4

The graphical representation of joint latent Beta composition over groups and users. The meanings of the variables and the relations in this model can be found from Sect. 3

In the above process, the superscripts u and g denote the user-related or group-related variables. Additional descriptions of this joint model are as follows. We model the topics for users and groups by assuming that they share the same latent topic dictionary \(\varvec{\Upphi}\), which results in that the comparison of interests can be directly obtained by comparing the latent topic variables. Using different dictionaries however will lead to the complexity of the model and the difficulty of the optimization. We propose to use different conjugate prior parameters α β for users and groups based on the observation that users and groups essentially have different interest distribution. The variance parameters σ u and σ g in the Gaussian distribution are also different because the photos from users and groups have different diversities.

The prediction function h(y uy g; η, ρ) is defined over the latent topics of users and groups. The user interest y u is the averaged aggregation, \({\bf y}^u = \frac{1}{N^u}\sum\nolimits_{i=1}^{N^u} \hbox{Diag}({\bf z}_i^u){\bf w}_i^u\), and the interest for a group is similarly defined. Besides the relations between users and groups, we also aim to discover the latent interests from the image content and in addition explore the similarity of the latent interests between users and groups. We observed that the photos from a user may contain multiple topics while the photos from a group often cover a single or a few topics. Figures 1, 2 and 5 show such an observation. Based on this point, we design the prediction function using the max-min criterion,

$$ h({\bf y}^u, {\bf y}^g; \eta, \rho) = \frac{1}{1+ e^{-{\eta}\max_{d \in \{1, \cdots, D\}}\min \{y_d^u, y_d^g\} - \rho}}. $$
(5)

Here the inner min operator aims to find the consensus over the d-th topic between the user u and the group g, and the outer max operator aims to find the dominant common topic between the user and the group.

Fig. 5
figure 5

Other example photos from the same two members in Fig. 2

5 Inference

In this section, we present a Gibbs-sampling algorithm to infer the model parameters. In our inference algorithm, we sample the latent variables z i . Let’s first see how the other latent variables are integrated out. For the convenience of the description, we may drop the superscripts u or g and subscripts i if not affecting the understanding.

Integrating out \(\boldsymbol{\theta}\)

Given a user or a group, we have the likelihood \(p(z_{ij}|\theta_j) = \hbox{Bernoulli}(\theta_j)\) and the conjugate prior distribution, \(p(\theta_j; \alpha, \beta) = \hbox{Beta}(\alpha, \beta)\). Our goal is to integrate out \(\boldsymbol{\theta}\),

$$ \begin{aligned} &p(\{{\bf z}_i\};{\boldsymbol {\alpha,\beta}}) \\ =& \int p(\boldsymbol{\theta};\alpha,\beta)p(\{{\bf z}_i\} | \boldsymbol{\theta})d\boldsymbol{\theta} \\ =& \int (\prod_{j=1}^D \frac{1}{\hbox{B}(\alpha, \beta)} \theta_j^{\alpha-1}(1-\theta_j)^{\beta-1})\prod_{i=1}^N \prod_{j=1}^D \theta_j^{z_{ij}}(1-\theta_j)^{1-z_{ij}}d\boldsymbol{\theta}\\ =& \int \prod_{j=1}^D \frac{1}{\hbox{B}(\alpha, \beta)} \theta_j^{\alpha+\sum_{i=1}^N z_{ij}-1}(1-\theta_j)^{\beta + N - \sum_{i=1}^Nz_{ij}-1}d\boldsymbol{\theta} \\ =& \prod_{j=1}^D \frac{\hbox{B}(\alpha + \sum_{i=1}^Nz_{ij},\beta + N - \sum_{i=1}^Nz_{ij})}{\hbox{B}(\alpha, \beta)}. \end{aligned} $$
(6)

Integrating out \(\varvec{\Upphi}\)

We have \(p({\bf f}_i | {\bf z}_i, \varvec{\Upphi}; {\bf w}_i, \sigma) = \hbox{Gaussian}(\varvec{\Upphi}\hbox{Diag}({\bf z}_i){\bf w}_i, \sigma^2{\bf I})\) and \(p(\phi_{ij}; \mu, \lambda) = \hbox{Gaussian}(\mu, \lambda^2)\). Our goal is to integrate out \(\varvec {\Upphi}\). The conditional distribution over {f u i } is

$$ \begin{aligned} &p(\{{\bf f}^u_i\} | \{{\bf z}_i^u\}, \{{\bf w}_i^u\}, {\varvec{\Upphi}}; \sigma_u) \\ =&\prod_{i=1}^{P_u}p({\bf f}_i | {\bf z}_i, {\bf w}_i, \varvec{\Upphi}; \sigma_u) \\ =&\prod_{i=1}^{P_u}\frac{1}{(2 \pi \sigma_u^2)^\frac{D}{2}}e^{-\frac{1}{2\sigma_u^2}\|{\bf f}_i - \varvec{\Upphi}\hbox{Diag}({\bf z}_i){\bf w}_i\|_2^2}\\ =&\frac{1}{(2 \pi \sigma_u^2)^\frac{DP_u}{2}}e^{-\frac{1}{2\sigma_u^2}\|{\bf F}_u - \varvec{\Upphi}{\bf X}_u\|_2^2} \\ =&\frac{1}{(2 \pi \sigma_u^2)^\frac{DP_u}{2}}e^{-\frac{1}{2\sigma_u^2}\hbox{Tr}(({\bf F}_u - \varvec{\Upphi}{\bf X}_u)({\bf F}_u - \varvec{\Upphi}{\bf X}_u)^T)}. \end{aligned} $$
(7)

where P u is the number of all images that belong to users, \({\bf F}_{\bf u}=[{\bf f}_1 \cdots {\bf f}_{P_u}], {\bf X}_{\bf u}=[\hbox{Diag}({\bf z}_1){\bf w}_1 \cdots \hbox{Diag}({\bf z}_{P_u}){\bf w}_{P_u}]\) are matrices of which each column matches one image of a user, \(\hbox{Tr}(\cdot)\) is the trace operator.

Similarly, the conditional distribution over {f g i } is

$$ \begin{aligned} &p(\{{\bf f}_i^g\} | \{{\bf z}_i^g\}, \{{\bf w}_i^g\}, \varvec{\Upphi}; \sigma_g) \\ =&\prod_{i=1}^{P_g}p({\bf f}_i | {\bf z}_i, {\bf w}_i, \varvec{\Upphi}; \sigma_g) \\ =&\prod_{i=1}^{P_g}\frac{1}{(2 \pi \sigma_g^2)^\frac{D}{2}}e^{-\frac{1}{2\sigma_g^2}\|{\bf f}_i - \varvec{\Upphi}\hbox{Diag}({\bf z}_i){\bf w}_i\|_2^2}\\ =&\frac{1}{(2 \pi \sigma_g^2)^\frac{DP_g}{2}}e^{-\frac{1}{2\sigma_g^2}\|{\bf F}_g - \varvec{\Upphi}{\bf X}_g\|_2^2} \\ =&\frac{1}{(2 \pi \sigma_g^2)^\frac{DP_g}{2}}e^{-\frac{1}{2\sigma_g^2}\hbox{Tr}(({\bf F}_g - \varvec{\Upphi}{\bf X}_g)({\bf F}_g - \varvec{\Upphi}{\bf X}_g)^T)}, \end{aligned} $$
(8)

where P g is the number of all images that belong to groups, \({\bf F}_{\bf g}=[{\bf f}_1 \cdots {\bf f}_{P_g}], {\bf X}_{\bf g}=[\hbox{Diag}({\bf z}_1){\bf w}_1 \cdots \hbox{Diag}({\bf z}_{P_g}){\bf w}_{P_g}]\) are matrices of which each column matches one image of a group.

The distribution over \(\varvec{\Upphi}\) is

$$ \begin{aligned} &p(\varvec{\Upphi};\mu,\lambda) \\ =&\prod p(\phi_{ij}; \mu, \lambda) \\ =&\prod \frac{1}{\sqrt{2\pi\lambda^2}}e^{-\frac{1}{2\lambda^2} (\phi_{ij} - \mu)^2} \\ =& \frac{1}{(2\pi\lambda^2)^\frac{D^2}{2}}e^{-\frac{1}{2\lambda^2} \|\varvec{\Upphi} - \mu {\bf E}\|_2^2} \\ =&\frac{1}{(2\pi\lambda^2)^\frac{D^2}{2}}e^{-\frac{1}{2\lambda^2} \hbox{Tr}((\varvec{\Upphi} - \mu {\bf E})(\varvec{\Upphi} - \mu {\bf E})^T)}. \end{aligned} $$
(9)

where E is a D × D matrix whose elements are all equal to 1.

With the above equations we have

$$ \begin{aligned} &p(\{{\bf f}_i^u\},\{{\bf f}_i^g\}| \{{\bf z}_i^u\},\{{\bf z}_i^g\}; \{{\bf w}_i^u\},\{{\bf w}_i^g\}, \sigma_u, \sigma_g,\mu, \lambda) \\ =& \int p(\{{\bf f}_i^u\} | \{{\bf z}_i^u\}, \{{\bf w}_i^u\}, \varvec{\Upphi}; \sigma_u)p(\{{\bf f}_i^g\} | \{{\bf z}_i^g\}, \{{\bf w}_i^g\}, \varvec{\Upphi}; \sigma_g) p(\varvec{\Upphi};\mu,\lambda) d \varvec{\Upphi} \\ =&\frac{1}{(2 \pi \sigma_u^2)^\frac{DP_u}{2}(2 \pi \sigma_g^2)^\frac{DP_g}{2}(2\pi\lambda^2)^\frac{D^2}{2}} \int e^{-\frac{1}{2}\hbox{Tr}(\varvec{\Upphi}{\bf A}\varvec{\Upphi}^T -2\varvec{\Upphi}{\bf B} +{\bf C})}d\varvec{\Upphi} \\ =&\frac{(2\pi)^\frac{D^2}{2}}{(2 \pi \sigma_u^2)^\frac{DP_u}{2}(2 \pi \sigma_g^2)^\frac{DP_g}{2}(2\pi\lambda^2)^\frac{D^2}{2}} |{\bf A}|^{-\frac{D}{2}}e^{-\frac{1}{2}\hbox{Tr}({\bf C}-{\bf B}{\bf A}^{-1}{\bf B}^T)}, \end{aligned} $$
(10)

where \(\hbox{A}=\frac{{\bf X}_u{\bf X}_u^T}{\sigma_u^2}+\frac{{\bf X}_g{\bf X}_g^T}{\sigma_g^2}+\frac{{\bf I}}{\lambda^2}, {\bf B}=\frac{{\bf X}_u{\bf F}_u^T}{\sigma_u^2}+\frac{{\bf X}_g{\bf F}_g^T}{\sigma_g^2}+\frac{\mu{\bf E}}{\lambda^2}, {\bf C}=\frac{{\bf F}_u{\bf F}_u^T}{\sigma_u^2}+\frac{{\bf F}_g{\bf F}_g^T}{\sigma_g^2}+\frac{\mu^{\bf 2}{\bf E}^{\bf 2}}{\lambda^2}\).

Joint distribution

The joint distribution over {z u i } is computed as follows,

$$ p(\{{\bf z}_i^u\};\alpha_u,\beta_u)=\prod_{u=1}^{U}p(\{{\bf z}_i\}^u;\alpha_u,\beta_u). $$
(11)

Similarly, the joint distribution over {z g i } is

$$ p(\{{\bf z}_i^g\};\alpha_g,\beta_g)=\prod_{g=1}^Gp(\{{\bf z}_i\}^g;\alpha_g,\beta_g). $$
(12)

The joint distribution over {z u i }, {z g i } is computed as follows,

$$ p(\{{\bf z}_i^u\},\{{\bf z}_i^g\};\alpha_u,\alpha_g,\beta_u,\beta_g) =p(\{{\bf z}_i^u\};\alpha_u,\beta_u)p(\{{\bf z}_i^g\};\alpha_g,\beta_g). $$
(13)

The joint distribution over {z u i f u i }, {z g i f g i } is computed as follows,

$$ \begin{aligned} &p(\{{\bf z}^u_i,{\bf f}^u_i\},\{{\bf z}^g_i,{\bf f}^g_i\}; \Uptheta) \\ =&p(\{{\bf z}_i^u\},\{{\bf z}_i^g\};\Uptheta)p(\{{\bf f}_i^u\},\{{\bf f}_i^g\}|\{{\bf z}_i^u\},\{{\bf z}_i^g\};\Uptheta). \end{aligned} $$
(14)

where \(\Uptheta = \{\alpha_u, \beta_u, \{{\bf w}^u_i\}, \sigma_u, \mu, \lambda, \alpha_g, \beta_g, \{{\bf w}^g_i\}, \sigma_g, \eta, \rho\}\) represents the variables to be estimated.

The distribution over the links is as follows,

$$ p(\{r_{ug}\} | \{{\bf z}_i^u\}, \{{\bf z}_j^g\}; \{{\bf w}_i^u\}, \{{\bf w}_j^g\}) = \prod_{(u, g)} p(r_{ug} | {\bf y}^u, {\bf y}^g; \eta, \rho). $$
(15)

Therefore, the joint distribution over {z u i f u i }, {z g i f g i }, {r ug } is written as the following,

$$ \begin{aligned} &p(\{{\bf z}_i^u, {\bf f}_i^u\}, \{{\bf z}_i^g, {\bf f}_i^g\}, \{r_{ug}\}; \Uptheta) \\ = & p(\{{\bf z}_i^u, {\bf f}_i^u\},\{{\bf z}_i^g, {\bf f}_i^g\}; \Uptheta) p(\{r_{ug}\}| \{{\bf z}_i^u, {\bf f}_i^u\}, \{{\bf z}_i^g, {\bf f}_i^g\}; \Uptheta). \end{aligned} $$
(16)

5.1 Algorithm

The algorithm consists of two steps: conditional sampling and parameter estimation. The first step samples latent topics according to the conditional distribution, and the second step mainly estimates the parameters.

Conditional sampling

Given the current topic parameters {z u i }, {z g j }, the conditional distribution with respect to the d-th topic z u id of the i-th image of user d can be written as follows,

$$ p(\tilde{z}_{id}^u | -) \propto p(\{\bar{{\bf z}}_i^u, {\bf f}_i^u\},\{{\bf z}_i^g, {\bf f}_i^g\}; \Uptheta) p(\{r_{ug}\}| \{\bar{{\bf z}}_i^u, {\bf f}_i^u\}, \{{\bf z}_i^g, {\bf f}_i^g\}; \Uptheta), $$
(17)

where \(\bar{z}_{ik}^u = z_{ik}^u\) for k ≠ d and \(\bar{z}_{id}^u = \tilde{z}_{id}^u\). Similarly, we can get the conditional distribution p(z g jd | −).

Parameter estimation

The parameters are estimated in a coordinate-wise manner and for each coordinate the gradient descent algorithm is adopted. The parameters α U and β U for users can be estimated by maximizing the likelihood,

$$ \begin{aligned} \log L &(\alpha_U, \beta_U | \{{\bf z}_{i}^u\}) \\ =& \sum_{u, d} (\log \hbox{B}(\alpha_U + \sum_{i=1}^{N_u}z_{id}^u, \beta_U + N_u - \sum_{i=1}^{N_u}z_{id}^u) - \log \hbox{B}(\alpha_U, \beta_U)). \end{aligned} $$
(18)

The partial derivatives are written as follows,

$$ \begin{aligned} \frac{\partial}{\partial \alpha_U}& \log L(\alpha_U, \beta_U | \{{\bf z}_{i}^u\}) \\ =& \sum_{u, d} (\psi(\alpha_U + \sum_{i=1}^{N_u}z_{id}^u) - \psi(\alpha_U+\beta_U+N_u) + \psi(\alpha_U+\beta_U) - \psi(\alpha_U)) \end{aligned} $$
(19)
$$ \begin{aligned} \frac{\partial}{\partial \beta_U} & \log L(\alpha_U, \beta_U | \{{\bf z}_{i}^u\}) \\ =& \sum_{u, d} (\psi(\beta_U + N_u - \sum_{i=1}^{N_u} z_{id}^u) - \psi(\alpha_U+\beta_U+N_u) + \psi(\alpha_U+\beta_U) - \psi(\beta_U)), \end{aligned} $$
(20)

where \(\psi(\cdot)\) is the digamma function. Then a gradient descent algorithm can be adopted to estimate the two parameters.

The parameters, μ, λ, σ U and σ G , are then sequentially estimated by maximizing the loglikelihood computed from Eq. 10 in a gradient descent manner. The parameters w u i and w g j are estimated by jointly considering Eqs. 5 and 10 in a gradient descent manner. The prediction function h(; η, ρ) with the Bernoulli distribution is essentially equivalent to the logistic regression with \(\max_{d \in \{1, \cdots, D\}}\min \{y^u_d, y^g_d\}\) as the training features. At the first iteration of the algorithm, this prediction function is not involved into the process. After getting the first estimation of the latent topics, we compute \(\max_{d \in \{1, \cdots, D\}}\min \{y^u_d, y^g_d\}\) for each pair (ug), and update the relation matrix R by setting entries zeros or ones and leaving the rest unknown according to the maxmin values. In the later iterations, the pairs with known values are used to estimate the parameters.

6 Experiment

In this section, we conduct experiments to evaluate the performance of our proposed approach for Flickr group recommendation and demonstrate its effectiveness by comparing with other widely-used techniques.

6.1 Setup

6.1.1 Dataset

There is no public benchmark of Flickr group recommendation that is available for performance evaluation. We collect the dataset using Flickr APIFootnote 1. We crawled 200 popular Flickr groups based on the tag-based group search in Flickr, by selecting 20 popular tags and selecting top 10 groups for each tag. Then we collect the information of 53,858 users in these 200 groups, including the photos and the associated tags they uploaded as well as their profile information. The number of photos in our data set is 6,156,124, including 3,711,319 user-uploaded photos, 2,892,271 user-favored photos, and 142,495 photos in the group pools.

For each photo, we extract two types of features: visual feature and textual feature. We extract the dense sampling-based visual words as the visual feature that has been shown to yield better performance than raw visual features such as color or texture as reported. Specifically, we divide the image into uniformly distributed blocks. Then raw SIFT features (Lowe 2004) are extracted for each block, and quantized to visual words (Sivic and Zisserman 2009) with a vocabulary of 1,024 visual words obtained by k-means clustering. A histogram of visual words is formed to represent the visual content information. We also extract the textual features from the tags associated with the images. As a preprocessing, we remove stop words with the snowball stop word list, and filter the tags whose frequency is less than 5 in the whole corpus, which are viewed as noise or typo. After modeling the continuous latent topic representation for the textual features by latent Dirichlet allocation, we catenate the textual and visual features together as a whole vector to represent the image.

6.1.2 Evaluation

The performance evaluation metrics of group recommendation (Herlocker et al. 2004) consist of three classes, predictive accuracy metric, classification accuracy metric and rank accuracy metric. In this paper, we choose classification accuracy as the evaluation metric, which measures the frequency of a recommender system making correct or incorrect decisions. Specifically, we adopt one popular metric, i.e., the precision at the first N position Precision@N to measure the percentage of groups that the user would like to join. We randomly sample 10% of the user-group links (i.e., the user has joined the group) for prediction, sample another 10% for validation, and use the rest user-group links as the training set.

6.2 Illustration

We show experimental results on the Flickr group recommendation. We first present visual illustration, then show the effectiveness in convergence and correctness of the inference process.

6.2.1 Visual illustration

Figure 6 shows the latent topics learned from the joint topic model. We show the representative images from the four latent topics of different users and the associated tags and owners. We have the following observations. One the one hand, these images are classified into different semantic meanings. On the other hand, the photos in the same topic may be similar in low-level visual contents (e.g., color and texture) but with missing or noisy tags, or similar in textual information with dissimilar visual contents, or even dissimilar in both visual and textual information but with the same semantic meanings which is thanks to the user-group relations. In summary, this result justifies that both user-group relations and content feature are helpful for latent topic discovery.

Fig. 6
figure 6

Visual examples for four random-selected latent topics

6.2.2 Convergence

The following illustrates the convergence of the inference process. Figure 7a shows the log-joint probability at each iteration and the maximum log-joint probability among this iteration and the previous iterations. We can see that the probability reaches a very high value in the early 10–20 iterations, and then becomes a little stable. The theoretical analysis on the convergence of Gibbs-sampling has been discussed in (Liu 2002). Our experiments validate that the Gibbs-sampling based inference is effective for our model.

Fig. 7
figure 7

a The change of the log-joint probability with the number of the iterations increases. b The change of the mean absolute error with the number of the iterations increases

Figure 7b shows the mean absolute error (MAE) in each iteration (MAE@iteration) and the minimum MAE among this iteration and the previous iterations (Minimum MAE). The MAE value is calculated by \(\hbox{E}\|r_{gu}-r_{gu}^{\prime}\|\). It describes the error between the actual result r gu and predicted score \(r_{gu}^{\prime}\), with the lower value showing a better performance. One can see that consistent to Fig. 7a the MAE value drops much in the early 10–20 iterations and is stably decreasing in the later learning progress.

6.2.3 Latent topics

The best dimensionality of the latent space, i.e., the number of latent topics D, is achieved from the validation set. We check different D from 6, 8, 10 to 50 and test the results over the validation set by calculating the P@N for each D. The results are shown in Fig. 8. One can see that the best performance is reached when D is set to 20. D with a smaller value may be not able to discriminate the topics with different meanings, and D with a larger value higher may generate too many latent topics that separate photos with similar meaning in different topics. Although the validation can help find the optimal D, this process is a little time-consuming for large scale data sets as various D have to be checked. In the future, we will investigate more efficient determination schemes to determine the number of topics.

Fig. 8
figure 8

The performance with different numbers of latent topics

6.3 Comparison

We compare the joint topic model with several state-of-the-art techniques adopted in other recommendation applications for various social media applications, which can be directly adopted to solve this problem. According to the taxonomy of recommendation in related work, the competitors fall in the following categories.

  • Content. We compare our approach with the content analysis approaches, which try to convert all sources of features to an unified space using some methods, such as latent semantic analysis (Deerwester et al. 1990), probabilistic latent semantic analysis (Hofmann 1999, 2004; Negoescu et al. 2009; Negoescu and Gatica-Perez 2008), and latent Dirichlet allocation (Blei et al. 2003). We tested all the three types of methods and found their performances are similar. In the comparison, we select the best result from the three methods and name it as content. Besides, we also directly train a classifier for each group over photos using support vector machines. To recommend a user to a group, we class the photos into each group and then use the classification results to vote the group for the user.

  • Collaborative Filtering. We also compare the result from collaborative filtering methods (Herlocker et al. 2004; Koren and Bell 2011; Su and Khoshgoftaar 2009) and report the result from the recent state-of-the art collaborative filtering approach, called singular value decomposition++ (SVD++) (Koren 2008, 2010). SVD++ can improve the low rank matrix factorization based methods and handle the cases with missing data.

  • Hybrid. We also compare our approach with one representative method, RankBoost (Freund et al. 2003). This method has been applied to Flickr tag recommendation (Wu et al. 2009) by combining the recommendations from content and collaborative filtering and has been shown to achieve the best result. In the comparison, the method is denoted by hybrid.

Figure 9 shows the comparison results, in which we can clearly see the competitive results of our approach compared with other methods. More specifically, the content-based approach gets the worst performance, due to the low effectiveness of low-level visual features and noises in the tags. However, the collaborative filtering approach performs much better. The hybrid approach, combining both content and collaborative filtering, gains the best results as expected, which is consistent to the results in the other recommendation system.

Fig. 9
figure 9

Performance comparison of our approach (JTM) with other state-of-the-art recommendation methods

Our approach simultaneously makes use of the content, and in addition the relations between users and groups, while the content-based approach and the collaborative filtering-based approach only explores one cue. Hence, our approach is more capable to model the problem. In addition, the joint discovery of latent topics for users and groups makes the discovered topics more accurate and more capable of capturing the underlying user-group interaction. Compared with previous hybrid-based methods, our approach models the content using the latent topics, which are learnt from the data. The manner can extract more discriminative and informative features and remove useless information for our specific problem, the relation mining, while the whole content may affect the performance if useless information is kept. On the other hand, the prediction function and the latent topic learning can benefit from each other. This means that the prediction function is more adaptive to the latent topics, and it can also help on learning latent topics that will furthermore make the prediction stronger. All these make our approach more robust and perform the best among the state-of-the-art approaches.

7 Conclusion

In this paper, we systematically study the problem of recommending groups to users in social media network. A joint topic model is proposed, which makes use of content feature, collaborative information and social relations in an integrated framework. The proposed approach discovers the topics for users and groups jointly and simultaneously learns the prediction function to match latent topics. All these characteristics of the proposed algorithm make the discovered latent structure more accurately reflect the underlying user-group interaction, which results in a more effective group recommendation. An efficient inference algorithm is proposed based on Gibbs sampling. Experimental results on Flickr group recommendation show that our approach is more effective and efficient compared with the state-of-the-art group recommendation methods for social media networks.