Community-based seeds selection algorithm for location aware influence maximization

doi:10.1016/j.neucom.2017.10.007

Neurocomputing

Volume 275, 31 January 2018, Pages 1601-1613

https://doi.org/10.1016/j.neucom.2017.10.007 Get rights and content

Abstract

In this paper, we study the location aware influence maximization problem, which finds a seed set to maximize the influence spread on targeted users for a given query. In particular, we consider users who have geographical preferences on queries as targeted users. One challenge of the problem is how to find the targeted users and compute their preferences efficiently for given queries. To address this challenge, based on the R-tree, we devise a PR-tree index structure, in which each tree node stores the location and information of users’ geographical preferences. By traversing the PR-tree from the root in depth-first order, we can efficiently find the targeted users. Another challenge of the problem is to devise an algorithm for efficient seeds selection. To solve this challenge, we adopt the maximum influence arborescence (MIA) model to approximate the influence spread, and propose an efficient community-based seeds selection (CSS) algorithm. The proposed CSS algorithm finds seeds efficiently by constructing the PR-tree based indexes offline which precompute users’ community based influences, and preferentially computing the marginal influences of those who would be selected as seeds with high probability online. In particular, we propose a community detection algorithm which first computes the social influence based similarities by the MIA model and then adopts the spectral clustering algorithm to find optimal communities of the social network. Experimental results on real-world datasets collected from DoubanEvent demonstrate our proposed algorithm has superiority as compared to several state-of-the-art algorithms in terms of efficiency, while keeping large influence spread.

Introduction

In recent years, social networks have become prevalent platforms for product promotion (e.g., viral marketing). Previous studies have proven that the viral marketing strategy is more effective than TV or newspaper advertising. Aiming to find a certain number of users (called seeds) to maximize the expected number of influenced users (called influence spread) through the word-of-mouth effect, influence maximization is the key problem behind viral marketing in social networks [1], which has been extensively studied recently [2], [3], [4], [5]. With the proliferation of geo-social networks (such as Foursquare¹, and Facebook²), location-based products promotion is becoming more necessary in real applications. For instances, a new opened restaurant in Chelsea, New York, wants to be promoted in a social network platform with viral marketing. The promoting strategy of this restaurant is to provide free meals for a limit k users, who can maximize the influence spread over the targeted users through the powerful word-of mouth effect to attract them dining here. Obviously, the targeted users of this promotion are those who have geographical preferences on Chelsea, New York (i.e., those who frequently have dining near Chelsea, New York). In this promotion, how to select this k users is critical.

Recently, some researchers take the location information of the promotion into consideration, and extend the traditional influence maximization to the location aware influence maximization. Zhou et al. [6] take users’ historical mobility behaviors into consideration, and devise two heuristic algorithms to find seeds by using the proposed two phase information diffusion model. However, they do not consider co-influences of the selected seeds. Li et al. [7] study to find k users who have the highest influence over a group of users in a specified region. Wang et al. [8] define the distance-aware influence maximization, which considers the distance between users and the promoted location. However, these studies all assume each user has a known fixed location, which is not realistic. In fact, users in social networks usually check-in multiple places and have geographical preferences on multiple locations.

Different from these works, in this paper, we define a location aware influence maximization (LAIM) problem. In particular, given a social network, where each user has different preferences on different locations, and a query with a spatial region R and an integer k, the LAIM requires finding k seeds to maximize the influence spread over the targeted users, who have geographical preferences on the query region R. We show that, under the maximum influence arborescence (MIA) model, the LAIM problem is NP-hard, and its influence spread is submodular and monotone. Therefore, we could extend the existing greedy algorithm [3] with $1 - 1 / e$ approximation ratio to solve this problem. However, since the extended greedy algorithm requires to compute the exact marginal influence of each user and select the user with the largest marginal influence in each iteration, it suffers from poor efficiency. Therefore, we focus on the design of efficient solution for the LAIM problem.

There are two main challenges in solving the LAIM problem efficiently. The first is how to efficiently identify the targeted users and compute their geographical preferences for given queries. Thanks to the check-in information on social networks (e.g., Twitter³, and Foursquare), we could easily obtain users’ geographical preferences. Leveraging such information, we devise a PR-tree index structure for finding the targeted users, where each tree node stores the location and information of users’ geographical preferences. In particular, for given queries, we traverse the PR-tree from the root in depth-first order, and prune some tree nodes which have no region intersections with the queries to find targeted users efficiently.

The second challenge is to devise algorithms for efficient seeds selection. To address this challenge, we propose a community-based seeds selection (CSS) algorithm. The basic idea is that, we divide the whole network into communities, and then we find seeds within communities instead of the whole network, i.e., we utilize users’ influences within their communities to approximate their influences in the whole network. Our assumption is that each node’s influence propagation is limited to the community it resides. In particular, we define a community as a group of users who have frequent contact and are more likely to influence each other within the group than outside of it. To find good communities, we propose a community detection algorithm which first computes the social influence based similarities by the maximum influence arborescence (MIA) diffusion model, and then adopts the spectral clustering algorithm for the weighted directed graphs. To find the seeds efficiently, based on the detected communities, our CSS algorithm first constructs the PR-tree based indexes to store users’ community-based influences offline. Then, for the given queries, it assembles the corresponding indexes online to construct the priority queue for each community, which stores the community-based influences of users in descending order. Finally, based on the community-based priority queues, it finds seeds efficiently by preferentially computing the marginal influences of those who would be selected as seeds with high probability online. In particular, we devise a fast method to compute the upper bound of the marginal influences (i.e., estimated marginal influence) of users, which is time-saving for seeds selection.

In summary, the major contributions of our work are listed as follows:

•
We formally introduce the location aware influence maximization (LAIM) problem, which is NP-hard, and propose an efficient solution to solve this problem.
•
To efficiently identify the targeted users and compute their preferences for given queries, we propose a PR-tree index structure, in which each tree node stores the location and information of users’ geographical preferences.
•
Based on the spectral clustering, we devise a social influence based community detection algorithm by adopting the MIA model. In addition, we propose an efficient community based seeds selection algorithm by utilizing the offline community-based influence indexes and the online community-based priority queues.
•
We conduct a comprehensive performance evaluation on real-world datasets. Experimental results show the effectiveness and efficiency of our proposed algorithm.

The remainder of this paper is organized as follows. We review the related work in Section 2. Section 3 presents necessary background on influence maximization and influence spread computation. In Section 4, we formulate our location aware influence maximization (LAIM) problem. In Section 5, we present our community-based seeds selection solution, followed by experimental evaluation in Section 6. Finally, Section 7 concludes the paper.

Section snippets

Influence maximization

Influence maximization in social network. Influence maximization (IM) is first described as an algorithmic problem by Demongos and Richardson [9]. Then, Kempe et al. [1] formulate the IM problem as a discrete optimization problem. They prove that this problem is NP-hard and give a greedy algorithm with provable approximation guarantee (i.e., $1 - 1 / e$ ). However, the greedy algorithm needs to execute the Monte Carlo simulation [10] to obtain the approximation ratio, which faces the drawback of high

Influence maximization (IM)

A social network is modeled as a directed graph $G = (V, E),$ where nodes in $V = {v_{1}, v_{2}, \dots, v_{n}}$ model the users in the network and edges in E model the friendships or follow relationships between them. The influence maximization (IM) can be formally defined as follows [1].

Definition 1

Given a social network $G = (V, E),$ a specific propagation model C and an integer k, the influence maximization (IM) problem is to find a set of seed nodes S in G, where $| S | = k,$ such that under model C, the influence spread of S, denoted

Query model

A location aware influence maximization (LAIM) query Q consists of a region R and a budget number k (i.e., the number of seeds), denoted by $Q = (R, k)$ .

Data model

Users in the social network have geographical preferences. For user v, we define the ratio γ(v, Q) (0 ≤ γ(v, Q) ≤ 1) of v’s local check-ins in R over the total as his geographical preference on Q: $γ (v, Q) = \frac{\sum_{l \subseteq C (v) \cap R} n_{v} (l)}{\sum_{l \subseteq C (v)} n_{v} (l)},$ where C(v) is the set of locations user v has checked-in and n_v(l) denotes the number of check-ins of v at location l.

Location aware influence maximization

Definition 2

Our proposed solution

An overall view of our proposed solution is illustrated in Fig. 2. Our framework consists of three components: PR-tree index structure, community detection algorithm and community-based seeds selection algorithm.

PR-tree index structure. Based on R-tree, we propose a PR-tree index structure to find targeted users efficiently. Each node of the PR-tree index stores the location and information of users’ geographical preferences. For online queries, we traverse the PR-tree from the root in

Experiments

In this section, we study the performance of our method and compare our method with the state-of-the-art algorithms on real-world datasets.

Conclusion

In this paper, we study the location aware influence maximization (LAIM) problem in social networks. Under the MIA model, we prove that this problem is NP-hard, and the influence spread is monotone and submodular. To obtain the targeted users, (i.e., who have geographical preferences on the query region), we devise a PR-tree index structure. In addition, to find seeds efficiently for the LAIM problem, we propose a community-based seeds selection (CSS) algorithm, which iteratively selects users

Acknowledgment

The work was supported by the National Natural Science Foundation of China under grant 61502047, the Co-construction Program with the Beijing Municipal Commission of Education.

Xiao Li received her M.S. degree from Shandong Normal University in 2014. She is currently a Ph.D. candidate at Beijing University of Posts and Telecommunications, China. Her major is Computer Science. Her research interests include social influence analysis and machine learning.

References (51)

WangX. et al.
Distance-aware influence maximization in geo-social network
Proceedings of the IEEE International Conference on Data Engineering
(2016)
ZhaoY. et al.
Identification of influential nodes in social networks with community structure based on label propagation
Neurocomputing
(2016)
WangZ. et al.
Ranking influential nodes in socialnetworks based on node position and neighborhood
Neurocomputing
(2017)
LuF. et al.
Scalable influence maximization under independent cascade model
J. Netw. Comput. Appl.
(2017)
ZhangK. et al.
Maximizing influence in a social network: improved results using a genetic algorithm
Phys. A Stat. Mech. Appl.
(2017)
KimD. et al.
Influence maximization based on reachability sketches in dynamic graphs
Inf. Sci.
(2017)
M. Guerrero et al.
Adaptive community detection in complex networks using genetic algorithms
Neurocomputing
(2017)
XuK. et al.
Mining community and inferring friendship in mobile social networks
Neurocomputing
(2016)
M.E. Newman
Modularity and community structure in networks
Proc. Natl. Acad. Sci.
(2006)
BaiX. et al.
An overlapping community detection algorithm based on density peaks
Neurocomputing
(2017)

MaT. et al.

Led: a fast overlapping communities detection algorithm based on structural clustering

Neurocomputing

(2016)

WangX. et al.

Uncovering fuzzy communities in networks with structural similarity

Neurocomputing

(2016)

R. Ghosh et al.

Community detection using a measure of global influence

Advances in Social Network Mining and Analysis

(2010)

A. Guttman

R-trees: a dynamic index structure for spatial searching

Proceedings of the SIGMOD

(1984)

M. Meilă et al.

Clustering by weighted cuts in directed graphs

Proceedings of the SIAM Conference on Data Mining (SDM)

(February 2007)

D. Kempe et al.

Maximizing the spread of influence through a social network

Proceedings of the 2003 SIGKDD

(2003)

J. Leskovec et al.

Cost-effective outbreak detection in networks

Proceedings of the 2007 SIGKDD

(2007)

ChenW. et al.

Scalable influence maximization for prevalent viral marketing in large-scale social networks

Proceedings of the 2010 SIGKDD

(2010)

C. Borgs, M. Brautbar, J.T. Chayes, B. Lucier, Influence maximization in social networks: towards an optimal...

ChenW. et al.

Efficient influence maximization in social networks

Proceedings of the SIGKDD

(2009)

ZhouT. et al.

Location-based influence maximization in social networks

Proceedings of the Conference on Information and Knowledge Management

(2015)

LiG. et al.

Efficient location-aware influence maximization

Proceedings of the 2014 SIGMOD

(2014)

P. Domingos et al.

Mining the network value of customers

Proceedings of the 2001 SIGKDD

(2001)

D.J. MacKay

Introduction to monte carlo methods

Learning in Graphical Models

(1998)

A. Goyal et al.

Celf++: optimizing the greedy algorithm for influence maximization in social networks

Proceedings of the 2011 WWW

(2011)

Cited by (73)

Influence Maximization in social networks using discretized Harris’ Hawks Optimization algorithm
2023, Applied Soft Computing
Influence Maximization (IM) is the task of determining $k$ optimal influential nodes in a social network to maximize the influence spread using a propagation model. IM is a prominent problem for viral marketing and helps significantly in social media advertising. Previous Greedy and Reverse Influence Sampling-based IM approaches are ineffective in real-world social networks due to their significant computational cost and execution time. Further, even heuristic approaches applied to IM generally yield minimal performance gain relative to the decreased time complexity. This presents a challenge in developing cost-effective algorithms with low execution time that can handle diverse social networks. In this paper, we propose the discretization of the nature-inspired Harris’ Hawks Optimization meta-heuristic algorithm using community structures for optimal selection of seed nodes for influence spread. In addition to Harris’ Hawks’ intelligence, we employ a neighbor scout strategy algorithm to avoid blindness and enhance the searching ability of the hawks. Further, we use a candidate nodes-based random population initialization approach, and these candidate nodes aid in accelerating the convergence process for the entire populace, reducing the total computational cost. We evaluate the efficacy of our proposed DHHO approach on eight social networks using the Independent Cascade model for information diffusion. We observe that DHHO is comparable or better than competing meta-heuristic approaches for Influence Maximization across five metrics, and performs noticeably better than competing heuristic approaches.
A survey of graph neural network based recommendation in social networks
2023, Neurocomputing
With the widespread popularization of social network platforms, user-generated content and other social network data are growing rapidly. It is difficult for social users to select interested contents from the numerous social data. To alleviate information overload problem and enhance overall user experience of social networks, recommendation systems relying on historical behavioural data and social friendship relations of users, are widely used in social networks. Although researches on social recommendations have been conducted in recent years, recommendation systems of social networks still suffer from several challenges, such as data sparsity and lower performance. Since graph neural network has huge advantages in graph data learning by aggregating neighbors representations of the central node, it has been gathering pace in recent years. In this survey, we review graph neural network based literature for solving recommendation problems in social networks. We first introduce backgrounds of graph neural network and recommendation systems in social networks. Then, for different types of recommendation problems in social networks, we review different graph neural network based recommendation methods briefly. In particular, we first review GNN-based methods for general social recommendation and then review GNN-based methods for different social recommendation scenarios (such as friend recommendation and point-of-interest recommendation). Finally, we briefly discuss promising future directions of the graph neural network based recommendation in social networks.
TSIFIM: A three-stage iterative framework for influence maximization in complex networks
2023, Expert Systems with Applications
Citation Excerpt :
Based on community structure, Shang et al. (2017) provided an influence maximization framework for determining the most influential node set in large-scale complex networks. Li, Cheng et al. (2018) proposed the community-based seeds selection algorithm for solving IM problem, which utilizes relative location of nodes to divide the community structure of the network by employing spectral clustering algorithm. A community closeness-based influence maximization algorithm simultaneously considers the number of nodes and the density of edges in each community to find out the most influential nodes (Wu et al., 2020).
The problem of influence maximization is a classic issue that has been well-studied in the field of network science, but most of existing researches are compromising among computational complexity or result accuracy. In this work, a three-stage iterative framework for influence maximization (TSIFIM) is presented to find a set of seed spreaders in complex networks. In TSIFIM, the initial candidate seeds are first selected by considering the global communicability of each node and its importance in their local network. Then, in addition to the candidate seeds, other remained nodes are assigned to the specific communities based on the proposed local resource allocation similarity index, and the core node in each community which satisfies the local influence threshold condition are selected as the supplementary candidate seeds. Furthermore, we employ an adaptive search strategy to find the optimal solution among these candidates. The proposed algorithm is compared with eight popular influence maximization algorithms on nine real-world networks to verify the performance. Experimental results show that TSIFIM has better performance in terms of influence spreading, sensitivity analysis, seed dispersion and statistical test.
Dynamic node influence tracking based influence maximization on dynamic social networks
2022, Microprocessors and Microsystems
Citation Excerpt :
It includes heuristic methods such as DegreeDiscount (DD) algorithm [13], and LTR method [14], path-based methods such as matrix influence (MATI) [15], random sampling-based methods such as RIS [16], TIM+ [17], TPH [18], DIMM [19]. The IM has also been studied in realistic situations such as community-based influence maximization [20], location-aware [21], and context-aware [22]. However, most of these methods consider that social networks are static and ignore the fact that most real-world networks are dynamic and evolve over time.
Influence maximization is one of the popular problems in social network analysis. Its application includes viral marketing, epidemic control, and recommender systems. Most of the existing methods are applicable to static networks. However, many real-world networks are dynamic and they evolve with time. This paper studies influence maximization on dynamic social networks and proposes Dynamic Influence based Seed Selection (DYISSE) method. To find ‘ $k$ ’ most influential nodes, the proposed method first estimates the influence of each node by introducing a Two-hop Triangular Influence (TTI) that measures the influence strength of each node by utilizing the property of triangles. Based on the TTI, a method named the Dynamic Tractable Set (D-TeSt) is proposed to track the changes in the influence of individual nodes when the topology of the network changes with time. The performance of the DYISSE method is analyzed under the LAIC model on two real-world and twelve synthetic dynamic networks. The results show that the proposed method performs better than the temporal versions of DegreeDiscount, MaxDegree, K-shell, Random, and Closeness centrality measures in extracting initial influential nodes.
CBIM: Community-based influence maximization in multilayer networks
2022, Information Sciences
Selecting seed nodes (most influential nodes) in networks has attracted attention due to seed nodes’ ability to influence and spread information. Seed nodes are essential to understanding the spreading and controlling of the information dynamics of the networks. Influence maximization (IM) is predominant in monolayer networks. After the advancements and widespread usage of social networks, applying influence maximization to multilayer networks is gaining popularity. Identifying influential nodes precisely in multilayer networks is a challenging and yet unexplored task. Based on studies, individuals in a community interact frequently and are more likely to influence each other. Motivated by this observation, this paper proposes community-based influence maximization (CBIM) model to find k seed nodes in multilayer networks. CBIM has two phases: In the first phase, CBIM uses the function $FIC (M)$ to find the small communities from a multilayer network based on dice neighborhood similarity. It uses the function $CSC ({CS}_{init}, θ)$ to merge smaller communities and generate larger communities to improve communities’ quality. In the second phase, CBIM computes Edge Weight Sum (EWS) for each node in a community and ranks the nodes based on EWS. CBIM uses the quota-based approach to select the seed node set from the communities based on the ranks. A comparative study of various influence maximization (IM) algorithms shows that the CBIM algorithm performs better than the state-of-the-art. The simulation studies have shown that CBIM can detect a set of most influential nodes on real-time datasets under various settings and environments.
Random walk-based algorithm for distance-aware influence maximization on multiple query locations
2022, Knowledge-Based Systems
Citation Excerpt :
To obtain precise results for the DIM-MQL problem, the influence scope estimation must combine the geographic information with the network topology structure, which is unconcerned with the abovementioned IM methods. Because the methods [20–25] for the LIM problem maximize the influence propagation in a given query region, they addressed users in the social network equally and ignored the distance between users and the promoted location. Concerning the distance to the query location, the seed set determined by these methods cannot obtain the maximum influence spreading.
The problem with distance-aware influence maximization on multiple query locations (DIM-MQL) is selecting a group of nodes in the network to influence the nodes in the widest range possible near multiple query locations. A random walk-based algorithm for the DIM-MQL problem is presented. To accelerate query processing in real-time, our method involves offline and online processing. Offline processing conducts computations that are independent of the queries, and online processing answers queries in real-time. For offline processing, an algorithm is presented to estimate the upper and lower bounds of the influence spreading of the nodes based on a set of anchor points. We propose an algorithm to sample the influence spreading paths and estimate the influence spreading of the nodes. The number of samples required is analyzed and estimated. Based on the random walk approach, an algorithm is proposed to select anchor points by partitioning the nodes into groups. An algorithm is presented for seed selection in online processing. Based on the spreading bounds obtained in offline processing, a pruning technique is employed to accelerate query processing. Our empirical results show that the proposed algorithm can obtain a larger distance-aware influence spreading than other approaches.

View all citing articles on Scopus

Xiang Cheng received the Ph.D. degree from Beijing University of Posts and Telecommunications, China, in 2013. He is currently an Associate Professor at the Beijing University of Posts and Telecommunications. His research interests include data mining and data privacy.

Sen Su received the Ph.D. degree in Computer Science from the University of Electronic Science and Technology, China, in 1998. He is currently a Professor at the Beijing University of Posts and Telecommunications. His research interests include distributed systems and service computing.

Chenna Sun received the B.S. degree from Beijing University of Posts and Telecommunications, China, in 2016. She is currently a M.S. candidate at Beijing University of Posts and Telecommunications. Her major is Computer Science. Her research interests include social network analysis and machine learning.

View full text

Community-based seeds selection algorithm for location aware influence maximization

Abstract

Introduction

Section snippets

Influence maximization

Influence maximization (IM)

Query model

Data model

Location aware influence maximization

Our proposed solution

Experiments

Conclusion

Acknowledgment

Neurocomputing

Neurocomputing

J. Netw. Comput. Appl.

Phys. A Stat. Mech. Appl.

Inf. Sci.

Neurocomputing

Neurocomputing

Proc. Natl. Acad. Sci.

Neurocomputing

Neurocomputing

Neurocomputing

Maximizing the spread of influence through a social network

Proceedings of the 2003 SIGKDD

Cost-effective outbreak detection in networks

Proceedings of the 2007 SIGKDD

Scalable influence maximization for prevalent viral marketing in large-scale social networks

Proceedings of the 2010 SIGKDD

Efficient influence maximization in social networks

Proceedings of the SIGKDD

Location-based influence maximization in social networks

Proceedings of the Conference on Information and Knowledge Management

Efficient location-aware influence maximization

Proceedings of the 2014 SIGMOD

Mining the network value of customers

Proceedings of the 2001 SIGKDD

Introduction to monte carlo methods

Learning in Graphical Models

Celf++: optimizing the greedy algorithm for influence maximization in social networks

Proceedings of the 2011 WWW