Elsevier

Neurocomputing

Volume 275, 31 January 2018, Pages 1601-1613
Neurocomputing

Community-based seeds selection algorithm for location aware influence maximization

https://doi.org/10.1016/j.neucom.2017.10.007Get rights and content

Abstract

In this paper, we study the location aware influence maximization problem, which finds a seed set to maximize the influence spread on targeted users for a given query. In particular, we consider users who have geographical preferences on queries as targeted users. One challenge of the problem is how to find the targeted users and compute their preferences efficiently for given queries. To address this challenge, based on the R-tree, we devise a PR-tree index structure, in which each tree node stores the location and information of users’ geographical preferences. By traversing the PR-tree from the root in depth-first order, we can efficiently find the targeted users. Another challenge of the problem is to devise an algorithm for efficient seeds selection. To solve this challenge, we adopt the maximum influence arborescence (MIA) model to approximate the influence spread, and propose an efficient community-based seeds selection (CSS) algorithm. The proposed CSS algorithm finds seeds efficiently by constructing the PR-tree based indexes offline which precompute users’ community based influences, and preferentially computing the marginal influences of those who would be selected as seeds with high probability online. In particular, we propose a community detection algorithm which first computes the social influence based similarities by the MIA model and then adopts the spectral clustering algorithm to find optimal communities of the social network. Experimental results on real-world datasets collected from DoubanEvent demonstrate our proposed algorithm has superiority as compared to several state-of-the-art algorithms in terms of efficiency, while keeping large influence spread.

Introduction

In recent years, social networks have become prevalent platforms for product promotion (e.g., viral marketing). Previous studies have proven that the viral marketing strategy is more effective than TV or newspaper advertising. Aiming to find a certain number of users (called seeds) to maximize the expected number of influenced users (called influence spread) through the word-of-mouth effect, influence maximization is the key problem behind viral marketing in social networks [1], which has been extensively studied recently [2], [3], [4], [5]. With the proliferation of geo-social networks (such as Foursquare1, and Facebook2), location-based products promotion is becoming more necessary in real applications. For instances, a new opened restaurant in Chelsea, New York, wants to be promoted in a social network platform with viral marketing. The promoting strategy of this restaurant is to provide free meals for a limit k users, who can maximize the influence spread over the targeted users through the powerful word-of mouth effect to attract them dining here. Obviously, the targeted users of this promotion are those who have geographical preferences on Chelsea, New York (i.e., those who frequently have dining near Chelsea, New York). In this promotion, how to select this k users is critical.

Recently, some researchers take the location information of the promotion into consideration, and extend the traditional influence maximization to the location aware influence maximization. Zhou et al. [6] take users’ historical mobility behaviors into consideration, and devise two heuristic algorithms to find seeds by using the proposed two phase information diffusion model. However, they do not consider co-influences of the selected seeds. Li et al. [7] study to find k users who have the highest influence over a group of users in a specified region. Wang et al. [8] define the distance-aware influence maximization, which considers the distance between users and the promoted location. However, these studies all assume each user has a known fixed location, which is not realistic. In fact, users in social networks usually check-in multiple places and have geographical preferences on multiple locations.

Different from these works, in this paper, we define a location aware influence maximization (LAIM) problem. In particular, given a social network, where each user has different preferences on different locations, and a query with a spatial region R and an integer k, the LAIM requires finding k seeds to maximize the influence spread over the targeted users, who have geographical preferences on the query region R. We show that, under the maximum influence arborescence (MIA) model, the LAIM problem is NP-hard, and its influence spread is submodular and monotone. Therefore, we could extend the existing greedy algorithm [3] with 11/e approximation ratio to solve this problem. However, since the extended greedy algorithm requires to compute the exact marginal influence of each user and select the user with the largest marginal influence in each iteration, it suffers from poor efficiency. Therefore, we focus on the design of efficient solution for the LAIM problem.

There are two main challenges in solving the LAIM problem efficiently. The first is how to efficiently identify the targeted users and compute their geographical preferences for given queries. Thanks to the check-in information on social networks (e.g., Twitter3, and Foursquare), we could easily obtain users’ geographical preferences. Leveraging such information, we devise a PR-tree index structure for finding the targeted users, where each tree node stores the location and information of users’ geographical preferences. In particular, for given queries, we traverse the PR-tree from the root in depth-first order, and prune some tree nodes which have no region intersections with the queries to find targeted users efficiently.

The second challenge is to devise algorithms for efficient seeds selection. To address this challenge, we propose a community-based seeds selection (CSS) algorithm. The basic idea is that, we divide the whole network into communities, and then we find seeds within communities instead of the whole network, i.e., we utilize users’ influences within their communities to approximate their influences in the whole network. Our assumption is that each node’s influence propagation is limited to the community it resides. In particular, we define a community as a group of users who have frequent contact and are more likely to influence each other within the group than outside of it. To find good communities, we propose a community detection algorithm which first computes the social influence based similarities by the maximum influence arborescence (MIA) diffusion model, and then adopts the spectral clustering algorithm for the weighted directed graphs. To find the seeds efficiently, based on the detected communities, our CSS algorithm first constructs the PR-tree based indexes to store users’ community-based influences offline. Then, for the given queries, it assembles the corresponding indexes online to construct the priority queue for each community, which stores the community-based influences of users in descending order. Finally, based on the community-based priority queues, it finds seeds efficiently by preferentially computing the marginal influences of those who would be selected as seeds with high probability online. In particular, we devise a fast method to compute the upper bound of the marginal influences (i.e., estimated marginal influence) of users, which is time-saving for seeds selection.

In summary, the major contributions of our work are listed as follows:

  • We formally introduce the location aware influence maximization (LAIM) problem, which is NP-hard, and propose an efficient solution to solve this problem.

  • To efficiently identify the targeted users and compute their preferences for given queries, we propose a PR-tree index structure, in which each tree node stores the location and information of users’ geographical preferences.

  • Based on the spectral clustering, we devise a social influence based community detection algorithm by adopting the MIA model. In addition, we propose an efficient community based seeds selection algorithm by utilizing the offline community-based influence indexes and the online community-based priority queues.

  • We conduct a comprehensive performance evaluation on real-world datasets. Experimental results show the effectiveness and efficiency of our proposed algorithm.

The remainder of this paper is organized as follows. We review the related work in Section 2. Section 3 presents necessary background on influence maximization and influence spread computation. In Section 4, we formulate our location aware influence maximization (LAIM) problem. In Section 5, we present our community-based seeds selection solution, followed by experimental evaluation in Section 6. Finally, Section 7 concludes the paper.

Section snippets

Influence maximization

Influence maximization in social network. Influence maximization (IM) is first described as an algorithmic problem by Demongos and Richardson [9]. Then, Kempe et al. [1] formulate the IM problem as a discrete optimization problem. They prove that this problem is NP-hard and give a greedy algorithm with provable approximation guarantee (i.e., 11/e). However, the greedy algorithm needs to execute the Monte Carlo simulation [10] to obtain the approximation ratio, which faces the drawback of high

Influence maximization (IM)

A social network is modeled as a directed graph G=(V,E), where nodes in V={v1,v2,,vn} model the users in the network and edges in E model the friendships or follow relationships between them. The influence maximization (IM) can be formally defined as follows [1].

Definition 1

Given a social network G=(V,E), a specific propagation model C and an integer k, the influence maximization (IM) problem is to find a set of seed nodes S in G, where |S|=k, such that under model C, the influence spread of S, denoted

Query model

A location aware influence maximization (LAIM) query Q consists of a region R and a budget number k (i.e., the number of seeds), denoted by Q=(R,k).

Data model

Users in the social network have geographical preferences. For user v, we define the ratio γ(v, Q) (0 ≤ γ(v, Q) ≤ 1) of v’s local check-ins in R over the total as his geographical preference on Q: γ(v,Q)=lC(v)Rnv(l)lC(v)nv(l),where C(v) is the set of locations user v has checked-in and nv(l) denotes the number of check-ins of v at location l.

Location aware influence maximization

Definition 2

Our proposed solution

An overall view of our proposed solution is illustrated in Fig. 2. Our framework consists of three components: PR-tree index structure, community detection algorithm and community-based seeds selection algorithm.

PR-tree index structure. Based on R-tree, we propose a PR-tree index structure to find targeted users efficiently. Each node of the PR-tree index stores the location and information of users’ geographical preferences. For online queries, we traverse the PR-tree from the root in

Experiments

In this section, we study the performance of our method and compare our method with the state-of-the-art algorithms on real-world datasets.

Conclusion

In this paper, we study the location aware influence maximization (LAIM) problem in social networks. Under the MIA model, we prove that this problem is NP-hard, and the influence spread is monotone and submodular. To obtain the targeted users, (i.e., who have geographical preferences on the query region), we devise a PR-tree index structure. In addition, to find seeds efficiently for the LAIM problem, we propose a community-based seeds selection (CSS) algorithm, which iteratively selects users

Acknowledgment

The work was supported by the National Natural Science Foundation of China under grant 61502047, the Co-construction Program with the Beijing Municipal Commission of Education.

Xiao Li received her M.S. degree from Shandong Normal University in 2014. She is currently a Ph.D. candidate at Beijing University of Posts and Telecommunications, China. Her major is Computer Science. Her research interests include social influence analysis and machine learning.

References (51)

  • MaT. et al.

    Led: a fast overlapping communities detection algorithm based on structural clustering

    Neurocomputing

    (2016)
  • WangX. et al.

    Uncovering fuzzy communities in networks with structural similarity

    Neurocomputing

    (2016)
  • R. Ghosh et al.

    Community detection using a measure of global influence

    Advances in Social Network Mining and Analysis

    (2010)
  • A. Guttman

    R-trees: a dynamic index structure for spatial searching

    Proceedings of the SIGMOD

    (1984)
  • M. Meilă et al.

    Clustering by weighted cuts in directed graphs

    Proceedings of the SIAM Conference on Data Mining (SDM)

    (February 2007)
  • D. Kempe et al.

    Maximizing the spread of influence through a social network

    Proceedings of the 2003 SIGKDD

    (2003)
  • J. Leskovec et al.

    Cost-effective outbreak detection in networks

    Proceedings of the 2007 SIGKDD

    (2007)
  • ChenW. et al.

    Scalable influence maximization for prevalent viral marketing in large-scale social networks

    Proceedings of the 2010 SIGKDD

    (2010)
  • C. Borgs, M. Brautbar, J.T. Chayes, B. Lucier, Influence maximization in social networks: towards an optimal...
  • ChenW. et al.

    Efficient influence maximization in social networks

    Proceedings of the SIGKDD

    (2009)
  • ZhouT. et al.

    Location-based influence maximization in social networks

    Proceedings of the Conference on Information and Knowledge Management

    (2015)
  • LiG. et al.

    Efficient location-aware influence maximization

    Proceedings of the 2014 SIGMOD

    (2014)
  • P. Domingos et al.

    Mining the network value of customers

    Proceedings of the 2001 SIGKDD

    (2001)
  • D.J. MacKay

    Introduction to monte carlo methods

    Learning in Graphical Models

    (1998)
  • A. Goyal et al.

    Celf++: optimizing the greedy algorithm for influence maximization in social networks

    Proceedings of the 2011 WWW

    (2011)
  • Cited by (73)

    • TSIFIM: A three-stage iterative framework for influence maximization in complex networks

      2023, Expert Systems with Applications
      Citation Excerpt :

      Based on community structure, Shang et al. (2017) provided an influence maximization framework for determining the most influential node set in large-scale complex networks. Li, Cheng et al. (2018) proposed the community-based seeds selection algorithm for solving IM problem, which utilizes relative location of nodes to divide the community structure of the network by employing spectral clustering algorithm. A community closeness-based influence maximization algorithm simultaneously considers the number of nodes and the density of edges in each community to find out the most influential nodes (Wu et al., 2020).

    • Dynamic node influence tracking based influence maximization on dynamic social networks

      2022, Microprocessors and Microsystems
      Citation Excerpt :

      It includes heuristic methods such as DegreeDiscount (DD) algorithm [13], and LTR method [14], path-based methods such as matrix influence (MATI) [15], random sampling-based methods such as RIS [16], TIM+ [17], TPH [18], DIMM [19]. The IM has also been studied in realistic situations such as community-based influence maximization [20], location-aware [21], and context-aware [22]. However, most of these methods consider that social networks are static and ignore the fact that most real-world networks are dynamic and evolve over time.

    • Random walk-based algorithm for distance-aware influence maximization on multiple query locations

      2022, Knowledge-Based Systems
      Citation Excerpt :

      To obtain precise results for the DIM-MQL problem, the influence scope estimation must combine the geographic information with the network topology structure, which is unconcerned with the abovementioned IM methods. Because the methods [20–25] for the LIM problem maximize the influence propagation in a given query region, they addressed users in the social network equally and ignored the distance between users and the promoted location. Concerning the distance to the query location, the seed set determined by these methods cannot obtain the maximum influence spreading.

    View all citing articles on Scopus

    Xiao Li received her M.S. degree from Shandong Normal University in 2014. She is currently a Ph.D. candidate at Beijing University of Posts and Telecommunications, China. Her major is Computer Science. Her research interests include social influence analysis and machine learning.

    Xiang Cheng received the Ph.D. degree from Beijing University of Posts and Telecommunications, China, in 2013. He is currently an Associate Professor at the Beijing University of Posts and Telecommunications. His research interests include data mining and data privacy.

    Sen Su received the Ph.D. degree in Computer Science from the University of Electronic Science and Technology, China, in 1998. He is currently a Professor at the Beijing University of Posts and Telecommunications. His research interests include distributed systems and service computing.

    Chenna Sun received the B.S. degree from Beijing University of Posts and Telecommunications, China, in 2016. She is currently a M.S. candidate at Beijing University of Posts and Telecommunications. Her major is Computer Science. Her research interests include social network analysis and machine learning.

    View full text