Community-based seeds selection algorithm for location aware influence maximization
Introduction
In recent years, social networks have become prevalent platforms for product promotion (e.g., viral marketing). Previous studies have proven that the viral marketing strategy is more effective than TV or newspaper advertising. Aiming to find a certain number of users (called seeds) to maximize the expected number of influenced users (called influence spread) through the word-of-mouth effect, influence maximization is the key problem behind viral marketing in social networks [1], which has been extensively studied recently [2], [3], [4], [5]. With the proliferation of geo-social networks (such as Foursquare1, and Facebook2), location-based products promotion is becoming more necessary in real applications. For instances, a new opened restaurant in Chelsea, New York, wants to be promoted in a social network platform with viral marketing. The promoting strategy of this restaurant is to provide free meals for a limit k users, who can maximize the influence spread over the targeted users through the powerful word-of mouth effect to attract them dining here. Obviously, the targeted users of this promotion are those who have geographical preferences on Chelsea, New York (i.e., those who frequently have dining near Chelsea, New York). In this promotion, how to select this k users is critical.
Recently, some researchers take the location information of the promotion into consideration, and extend the traditional influence maximization to the location aware influence maximization. Zhou et al. [6] take users’ historical mobility behaviors into consideration, and devise two heuristic algorithms to find seeds by using the proposed two phase information diffusion model. However, they do not consider co-influences of the selected seeds. Li et al. [7] study to find k users who have the highest influence over a group of users in a specified region. Wang et al. [8] define the distance-aware influence maximization, which considers the distance between users and the promoted location. However, these studies all assume each user has a known fixed location, which is not realistic. In fact, users in social networks usually check-in multiple places and have geographical preferences on multiple locations.
Different from these works, in this paper, we define a location aware influence maximization (LAIM) problem. In particular, given a social network, where each user has different preferences on different locations, and a query with a spatial region R and an integer k, the LAIM requires finding k seeds to maximize the influence spread over the targeted users, who have geographical preferences on the query region R. We show that, under the maximum influence arborescence (MIA) model, the LAIM problem is NP-hard, and its influence spread is submodular and monotone. Therefore, we could extend the existing greedy algorithm [3] with approximation ratio to solve this problem. However, since the extended greedy algorithm requires to compute the exact marginal influence of each user and select the user with the largest marginal influence in each iteration, it suffers from poor efficiency. Therefore, we focus on the design of efficient solution for the LAIM problem.
There are two main challenges in solving the LAIM problem efficiently. The first is how to efficiently identify the targeted users and compute their geographical preferences for given queries. Thanks to the check-in information on social networks (e.g., Twitter3, and Foursquare), we could easily obtain users’ geographical preferences. Leveraging such information, we devise a PR-tree index structure for finding the targeted users, where each tree node stores the location and information of users’ geographical preferences. In particular, for given queries, we traverse the PR-tree from the root in depth-first order, and prune some tree nodes which have no region intersections with the queries to find targeted users efficiently.
The second challenge is to devise algorithms for efficient seeds selection. To address this challenge, we propose a community-based seeds selection (CSS) algorithm. The basic idea is that, we divide the whole network into communities, and then we find seeds within communities instead of the whole network, i.e., we utilize users’ influences within their communities to approximate their influences in the whole network. Our assumption is that each node’s influence propagation is limited to the community it resides. In particular, we define a community as a group of users who have frequent contact and are more likely to influence each other within the group than outside of it. To find good communities, we propose a community detection algorithm which first computes the social influence based similarities by the maximum influence arborescence (MIA) diffusion model, and then adopts the spectral clustering algorithm for the weighted directed graphs. To find the seeds efficiently, based on the detected communities, our CSS algorithm first constructs the PR-tree based indexes to store users’ community-based influences offline. Then, for the given queries, it assembles the corresponding indexes online to construct the priority queue for each community, which stores the community-based influences of users in descending order. Finally, based on the community-based priority queues, it finds seeds efficiently by preferentially computing the marginal influences of those who would be selected as seeds with high probability online. In particular, we devise a fast method to compute the upper bound of the marginal influences (i.e., estimated marginal influence) of users, which is time-saving for seeds selection.
In summary, the major contributions of our work are listed as follows:
- •
We formally introduce the location aware influence maximization (LAIM) problem, which is NP-hard, and propose an efficient solution to solve this problem.
- •
To efficiently identify the targeted users and compute their preferences for given queries, we propose a PR-tree index structure, in which each tree node stores the location and information of users’ geographical preferences.
- •
Based on the spectral clustering, we devise a social influence based community detection algorithm by adopting the MIA model. In addition, we propose an efficient community based seeds selection algorithm by utilizing the offline community-based influence indexes and the online community-based priority queues.
- •
We conduct a comprehensive performance evaluation on real-world datasets. Experimental results show the effectiveness and efficiency of our proposed algorithm.
The remainder of this paper is organized as follows. We review the related work in Section 2. Section 3 presents necessary background on influence maximization and influence spread computation. In Section 4, we formulate our location aware influence maximization (LAIM) problem. In Section 5, we present our community-based seeds selection solution, followed by experimental evaluation in Section 6. Finally, Section 7 concludes the paper.
Section snippets
Influence maximization
Influence maximization in social network. Influence maximization (IM) is first described as an algorithmic problem by Demongos and Richardson [9]. Then, Kempe et al. [1] formulate the IM problem as a discrete optimization problem. They prove that this problem is NP-hard and give a greedy algorithm with provable approximation guarantee (i.e., ). However, the greedy algorithm needs to execute the Monte Carlo simulation [10] to obtain the approximation ratio, which faces the drawback of high
Influence maximization (IM)
A social network is modeled as a directed graph where nodes in model the users in the network and edges in E model the friendships or follow relationships between them. The influence maximization (IM) can be formally defined as follows [1].
Definition 1 Given a social network a specific propagation model C and an integer k, the influence maximization (IM) problem is to find a set of seed nodes S in G, where such that under model C, the influence spread of S, denoted
Query model
A location aware influence maximization (LAIM) query Q consists of a region R and a budget number k (i.e., the number of seeds), denoted by .
Data model
Users in the social network have geographical preferences. For user v, we define the ratio γ(v, Q) (0 ≤ γ(v, Q) ≤ 1) of v’s local check-ins in R over the total as his geographical preference on Q: where C(v) is the set of locations user v has checked-in and nv(l) denotes the number of check-ins of v at location l.
Location aware influence maximization
Definition 2
Our proposed solution
An overall view of our proposed solution is illustrated in Fig. 2. Our framework consists of three components: PR-tree index structure, community detection algorithm and community-based seeds selection algorithm.
PR-tree index structure. Based on R-tree, we propose a PR-tree index structure to find targeted users efficiently. Each node of the PR-tree index stores the location and information of users’ geographical preferences. For online queries, we traverse the PR-tree from the root in
Experiments
In this section, we study the performance of our method and compare our method with the state-of-the-art algorithms on real-world datasets.
Conclusion
In this paper, we study the location aware influence maximization (LAIM) problem in social networks. Under the MIA model, we prove that this problem is NP-hard, and the influence spread is monotone and submodular. To obtain the targeted users, (i.e., who have geographical preferences on the query region), we devise a PR-tree index structure. In addition, to find seeds efficiently for the LAIM problem, we propose a community-based seeds selection (CSS) algorithm, which iteratively selects users
Acknowledgment
The work was supported by the National Natural Science Foundation of China under grant 61502047, the Co-construction Program with the Beijing Municipal Commission of Education.
Xiao Li received her M.S. degree from Shandong Normal University in 2014. She is currently a Ph.D. candidate at Beijing University of Posts and Telecommunications, China. Her major is Computer Science. Her research interests include social influence analysis and machine learning.
References (51)
- et al.
Distance-aware influence maximization in geo-social network
Proceedings of the IEEE International Conference on Data Engineering
(2016) - et al.
Identification of influential nodes in social networks with community structure based on label propagation
Neurocomputing
(2016) - et al.
Ranking influential nodes in socialnetworks based on node position and neighborhood
Neurocomputing
(2017) - et al.
Scalable influence maximization under independent cascade model
J. Netw. Comput. Appl.
(2017) - et al.
Maximizing influence in a social network: improved results using a genetic algorithm
Phys. A Stat. Mech. Appl.
(2017) - et al.
Influence maximization based on reachability sketches in dynamic graphs
Inf. Sci.
(2017) - et al.
Adaptive community detection in complex networks using genetic algorithms
Neurocomputing
(2017) - et al.
Mining community and inferring friendship in mobile social networks
Neurocomputing
(2016) Modularity and community structure in networks
Proc. Natl. Acad. Sci.
(2006)- et al.
An overlapping community detection algorithm based on density peaks
Neurocomputing
(2017)
Led: a fast overlapping communities detection algorithm based on structural clustering
Neurocomputing
Uncovering fuzzy communities in networks with structural similarity
Neurocomputing
Community detection using a measure of global influence
Advances in Social Network Mining and Analysis
R-trees: a dynamic index structure for spatial searching
Proceedings of the SIGMOD
Clustering by weighted cuts in directed graphs
Proceedings of the SIAM Conference on Data Mining (SDM)
Maximizing the spread of influence through a social network
Proceedings of the 2003 SIGKDD
Cost-effective outbreak detection in networks
Proceedings of the 2007 SIGKDD
Scalable influence maximization for prevalent viral marketing in large-scale social networks
Proceedings of the 2010 SIGKDD
Efficient influence maximization in social networks
Proceedings of the SIGKDD
Location-based influence maximization in social networks
Proceedings of the Conference on Information and Knowledge Management
Efficient location-aware influence maximization
Proceedings of the 2014 SIGMOD
Mining the network value of customers
Proceedings of the 2001 SIGKDD
Introduction to monte carlo methods
Learning in Graphical Models
Celf++: optimizing the greedy algorithm for influence maximization in social networks
Proceedings of the 2011 WWW
Cited by (73)
Influence Maximization in social networks using discretized Harris’ Hawks Optimization algorithm
2023, Applied Soft ComputingA survey of graph neural network based recommendation in social networks
2023, NeurocomputingTSIFIM: A three-stage iterative framework for influence maximization in complex networks
2023, Expert Systems with ApplicationsCitation Excerpt :Based on community structure, Shang et al. (2017) provided an influence maximization framework for determining the most influential node set in large-scale complex networks. Li, Cheng et al. (2018) proposed the community-based seeds selection algorithm for solving IM problem, which utilizes relative location of nodes to divide the community structure of the network by employing spectral clustering algorithm. A community closeness-based influence maximization algorithm simultaneously considers the number of nodes and the density of edges in each community to find out the most influential nodes (Wu et al., 2020).
Dynamic node influence tracking based influence maximization on dynamic social networks
2022, Microprocessors and MicrosystemsCitation Excerpt :It includes heuristic methods such as DegreeDiscount (DD) algorithm [13], and LTR method [14], path-based methods such as matrix influence (MATI) [15], random sampling-based methods such as RIS [16], TIM+ [17], TPH [18], DIMM [19]. The IM has also been studied in realistic situations such as community-based influence maximization [20], location-aware [21], and context-aware [22]. However, most of these methods consider that social networks are static and ignore the fact that most real-world networks are dynamic and evolve over time.
CBIM: Community-based influence maximization in multilayer networks
2022, Information SciencesRandom walk-based algorithm for distance-aware influence maximization on multiple query locations
2022, Knowledge-Based SystemsCitation Excerpt :To obtain precise results for the DIM-MQL problem, the influence scope estimation must combine the geographic information with the network topology structure, which is unconcerned with the abovementioned IM methods. Because the methods [20–25] for the LIM problem maximize the influence propagation in a given query region, they addressed users in the social network equally and ignored the distance between users and the promoted location. Concerning the distance to the query location, the seed set determined by these methods cannot obtain the maximum influence spreading.
Xiao Li received her M.S. degree from Shandong Normal University in 2014. She is currently a Ph.D. candidate at Beijing University of Posts and Telecommunications, China. Her major is Computer Science. Her research interests include social influence analysis and machine learning.
Xiang Cheng received the Ph.D. degree from Beijing University of Posts and Telecommunications, China, in 2013. He is currently an Associate Professor at the Beijing University of Posts and Telecommunications. His research interests include data mining and data privacy.
Sen Su received the Ph.D. degree in Computer Science from the University of Electronic Science and Technology, China, in 1998. He is currently a Professor at the Beijing University of Posts and Telecommunications. His research interests include distributed systems and service computing.
Chenna Sun received the B.S. degree from Beijing University of Posts and Telecommunications, China, in 2016. She is currently a M.S. candidate at Beijing University of Posts and Telecommunications. Her major is Computer Science. Her research interests include social network analysis and machine learning.